Upload
dangtram
View
220
Download
2
Embed Size (px)
Citation preview
What is Parsing?
String ofcharacters
Easy for humansto write
Easy for programsto process
Parser
A parser also checks that the input stringis well-formed, and if not, rejects it.
Data structure
What is Parsing?
String ofcharacters
Easy for humansto write
Easy for programsto process
Parser
A parser also checks that the input stringis well-formed, and if not, rejects it.
Data structure
Example 1
Charlton, 49
Lineker, 48
Beckham, 17
CSV (Comma Separated Value)
Array of pairs
Parser
“Charlton”
49
“Lineker”
48
“Beckham”
17
Example 1
Charlton, 49
Lineker, 48
Beckham, 17
CSV (Comma Separated Value)
Array of pairs
Parser
“Charlton”
49
“Lineker”
48
“Beckham”
17
Data structure?
typedef struct {char* name;int goals;
} Player;
A data structure is typically avalue of a data type in someprogramming language, e.g. C.
typedef struct {Player* players;int size;
} Squad;
The type of a player:
The type of a squad of players:
Data structure?
typedef struct {char* name;int goals;
} Player;
A data structure is typically avalue of a data type in someprogramming language, e.g. C.
typedef struct {Player* players;int size;
} Squad;
The type of a player:
The type of a squad of players:
Why data structures?
int total(Squad s){
int i, sum = 0;for (i = 0; i < s.size; i++)
sum += s.players[i].goals;return sum;
}
Data structures are convenientto process by a computerprogram.
The total goals scored by allplayers in a squad:
Why data structures?
int total(Squad s){
int i, sum = 0;for (i = 0; i < s.size; i++)
sum += s.players[i].goals;return sum;
}
Data structures are convenientto process by a computerprogram.
The total goals scored by allplayers in a squad:
Example 1: the problem
Squad parse(char* input){
...}
We want to be able to parseCSV files to values of typeSquad so we can process themconveniently.
LSA will teach you how to fill inthe dots. (This is a rather easyexample, though!)
Example 1: the problem
Squad parse(char* input){
...}
We want to be able to parseCSV files to values of typeSquad so we can process themconveniently.
LSA will teach you how to fill inthe dots. (This is a rather easyexample, though!)
Everyday parsing
Our email clients parse emailheaders, allowing search by to& from address etc.
Our web browsers parse HTML,JavaScript, CSS, etc.
Our copies of Call of Duty parseconfiguration files and saved-game states.
Everyday parsing
Our email clients parse emailheaders, allowing search by to& from address etc.
Our web browsers parse HTML,JavaScript, CSS, etc.
Our copies of Call of Duty parseconfiguration files and saved-game states.
LSA of PLs
In LSA, we are interested inparsing in general.
But we have a special interest inparsing programming languages(PLs). Why?
“If we can parse a PL, we canparse anything.”
In practice, we often want toparse PL-like languages.
Preparation for CGO.
LSA of PLs
In LSA, we are interested inparsing in general.
But we have a special interest inparsing programming languages(PLs). Why?
“If we can parse a PL, we canparse anything.”
In practice, we often want toparse PL-like languages.
Preparation for CGO.
Example 2
foo := 20 + bar
A pascal statement An abstractsyntax tree
Parser
ASSIGN
PLUS
NUM VAR20
“foo”
“bar”
We will return to this example later!
Example 2
foo := 20 + bar
A pascal statement An abstractsyntax tree
Parser
ASSIGN
PLUS
NUM VAR20
“foo”
“bar”
We will return to this example later!
LSA & CGO
The connection between Lexical and SyntaxAnalysis (2nd year module) and CodeGeneration and Optimisation (3rd yearmodule).
LSA & CGO
The connection between Lexical and SyntaxAnalysis (2nd year module) and CodeGeneration and Optimisation (3rd yearmodule).
LSA & CGO
TargetProgram
Easy for machinesto execute
Source Program (String)
Easy for humansto write andunderstand
Easy for compilerto process
LSA
Source Program
(Data structure)
CGO
LSA & CGO
TargetProgram
Easy for machinesto execute
Source Program (String)
Easy for humansto write andunderstand
Easy for compilerto process
LSA
Source Program
(Data structure)
CGO
Lexical Analysis
Identifies the lexemes in asentence.
Lexeme: a minimal meaningfulunit of a language.
Converts each lexeme to atoken.
Throws away ignorable textsuch as spaces, new-lines, andcomments.
(Also known as “scanning”)
Lexical Analysis
Identifies the lexemes in asentence.
Lexeme: a minimal meaningfulunit of a language.
Converts each lexeme to atoken.
Throws away ignorable textsuch as spaces, new-lines, andcomments.
(Also known as “scanning”)
What is a token?
Every token has an identifier,used to denote the kind oflexeme that it represents, e.g.
Token identifier denotes a
PLUS + operator
ASSIGN := operator
VAR variable
NUM number
Some tokens have a componentvalue, conventionally written inparenthesis after the identifier,e.g. VAR(foo), NUM(12).
What is a token?
Every token has an identifier,used to denote the kind oflexeme that it represents, e.g.
Token identifier denotes a
PLUS + operator
ASSIGN := operator
VAR variable
NUM number
Some tokens have a componentvalue, conventionally written inparenthesis after the identifier,e.g. VAR(foo), NUM(12).
Lexical Analysis
Stream of characters
Stream of tokens
Example input:
foo := 20 + bar
Example output:
VAR(foo), ASSIGN, NUM(20),PLUS, VAR(bar)
Lexical Analysis
Stream of characters
Stream of tokens
Example input:
foo := 20 + bar
Example output:
VAR(foo), ASSIGN, NUM(20),PLUS, VAR(bar)
Lexical Analysis
Lexemes are specified by regularexpressions. For example:
digit = 0 | ... | 9letter = a | ... | znumber = digit⋅ digit*
variable = letter⋅ (letter | digit)*
1443634
xfoofoo2x1y20
Example numbers: Example variables:
Lexical Analysis
Lexemes are specified by regularexpressions. For example:
digit = 0 | ... | 9letter = a | ... | znumber = digit⋅ digit*
variable = letter⋅ (letter | digit)*
1443634
xfoofoo2x1y20
Example numbers: Example variables:
Syntax Analysis
Syntax: the set of rules definingvalid strings of a language.
Syntax analysis converts a streamof symbols to a parse tree:
– a proof that a given input is validaccording to the language syntax;
– also a structure-rich representationof the input that is convenient toprocess.
(Also known as “parsing”)
Syntax Analysis
Syntax: the set of rules definingvalid strings of a language.
Syntax analysis converts a streamof symbols to a parse tree:
– a proof that a given input is validaccording to the language syntax;
– also a structure-rich representationof the input that is convenient toprocess.
(Also known as “parsing”)
Syntax Analysis
The syntax of a language isusually specified by a grammar.
Example:
stmt → VAR(v) ASSIGN expr
expr → VAR(v)| NUM(n)| expr PLUS expr
Where v represents any variablename and n any number.
Syntax Analysis
The syntax of a language isusually specified by a grammar.
Example:
stmt → VAR(v) ASSIGN expr
expr → VAR(v)| NUM(n)| expr PLUS expr
Where v represents any variablename and n any number.
Syntax Analysis
Example input:
Example output:
VAR(foo), ASSIGN, NUM(20),PLUS, VAR(bar)
stmt
NUM(20) VAR(bar)
VAR(foo) ASSIGN
PLUSexpr
expr
expr
Stream of symbols
Parse tree
Syntax Analysis
Example input:
Example output:
VAR(foo), ASSIGN, NUM(20),PLUS, VAR(bar)
stmt
NUM(20) VAR(bar)
VAR(foo) ASSIGN
PLUSexpr
expr
expr
Stream of symbols
Parse tree
Syntax Analysissubsumes
Lexical Analysis
Any language that can beaccepted by a regular expressioncan be accepted by a grammar.
But not vice-versa!*
Hence Syntax Analysis is morepowerful than Lexical Analysis.
* Can anyone give a simple example?
Syntax Analysissubsumes
Lexical Analysis
Any language that can beaccepted by a regular expressioncan be accepted by a grammar.
But not vice-versa!*
Hence Syntax Analysis is morepowerful than Lexical Analysis.
* Can anyone give a simple example?
Why bother withLexical Analysis?
Convenience: regular expressionsmore convenient than grammarsto define regular strings.
Efficiency: there are efficientalgorithms for matching regularexpressions that do not apply inthe more general setting ofgrammars.
Modularity: split a problem intotwo smaller problems.
Why bother withLexical Analysis?
Convenience: regular expressionsmore convenient than grammarsto define regular strings.
Efficiency: there are efficientalgorithms for matching regularexpressions that do not apply inthe more general setting ofgrammars.
Modularity: split a problem intotwo smaller problems.
Objectives
To learn how to implementefficient parsers
and to learn the theory behindparser generation from regularexpressions and grammars.
using a general purposeprogramming language (C);
using an automatic parsergenerator (Flex & Bison).
Objectives
To learn how to implementefficient parsers
and to learn the theory behindparser generation from regularexpressions and grammars.
using a general purposeprogramming language (C);
using an automatic parsergenerator (Flex & Bison).
Practicals
4 practicals in the lab.
Idea is to practice the techniquesdeveloped in the lectures.
Room CSE/069.
Spring weeks 9 & 10.
Summer weeks 4 & 5.
Two practical groups: A and B. Findyour group on the LSA web page.
Practical Topic
1 Lexical Analysis
2 Flex, a Lexical Analyser Generator
3 Recursive Descent Parsing
4 Bison, a Parser Generator
Organisation of practicals
Practicals
4 practicals in the lab.
Idea is to practice the techniquesdeveloped in the lectures.
Room CSE/069.
Spring weeks 9 & 10.
Summer weeks 4 & 5.
Two practical groups: A and B. Findyour group on the LSA web page.
Practical Topic
1 Lexical Analysis
2 Flex, a Lexical Analyser Generator
3 Recursive Descent Parsing
4 Bison, a Parser Generator
Organisation of practicals
Lecture contents
Chapter Title
1 Introduction
2 Abstract Syntax
3 Lexical Analysis
4 Flex, a Lexical Analyser Generator
5 Grammars
6 Recursive Descent Parsing
7 Bison, a Parser Generator
8 Top-Down Parsing
9 Bottom-Up Parsing
Organisation of Lectures
14 lectures, with notes arranged into9 chapters.
A single chapter may be covered inless than or more than one lecture.
Lecture contents
Chapter Title
1 Introduction
2 Abstract Syntax
3 Lexical Analysis
4 Flex, a Lexical Analyser Generator
5 Grammars
6 Recursive Descent Parsing
7 Bison, a Parser Generator
8 Top-Down Parsing
9 Bottom-Up Parsing
Organisation of Lectures
14 lectures, with notes arranged into9 chapters.
A single chapter may be covered inless than or more than one lecture.