Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Programming Language ConceptsLexical and Syntactic Analysis
Janyl Jumadinova24 January, 2017
Most Important Steps in Compilation
I Optional Preprocessing
I Lexical analysis (scanning)
I Syntax analysis (parsing)
I Semantic analysis
I Intermediate code generation
I Optimization (usually machine-independent)
I Final code generation
I Optional final optimization
2/30
Lexical Analysis
3/30
Lexical Analysis
For each token type, give a description:
- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),
- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”
4/30
Lexical Analysis
For each token type, give a description:
- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”
4/30
Lexical Analysis
Lexical analysis produces a “token stream” in which the progam isreduced to a sequence of token types, each with its identifyingnumber and the actual string (in the program) corresponding to it.
5/30
6/30
Syntactic Analysis
I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.
I Grammars are often specified in BNF notation (“Backus NaurForm”):
<item1> ::= valid replacements for <item1>
<item2> ::= valid replacements for <item2>
7/30
Syntactic Analysis
I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.
I Grammars are often specified in BNF notation (“Backus NaurForm”):
<item1> ::= valid replacements for <item1>
<item2> ::= valid replacements for <item2>
7/30
Syntactic Analysis
I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.
I Grammars are often specified in BNF notation (“Backus NaurForm”):
<item1> ::= valid replacements for <item1>
<item2> ::= valid replacements for <item2>
7/30
Syntactic Analysis
8/30
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
Grammars (Context-free Gramars)
Grammar
A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 2011020 can be parsed?
10/30
Grammars (Context-free Gramars)
Grammar
A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 2011020 can be parsed?
10/30
Grammars (Context-free Gramars)
Grammar
A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?
11/30
Grammars (Context-free Gramars)
Grammar
A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?
11/30
12/30
Syntactic Analysis
I The process of verifying that a token stream represents a validapplication of the rules is called parsing.
I Using the BNF rules we can construct a parse tree:
13/30
Sample Parse Tree (portion)
14/30
Sample Parse Tree (failed)
15/30
Grammar for Java (version 8)
I Overview of notation used:https://docs.oracle.com/javase/specs/jls/se8/html/
jls-2.html
I The full syntax grammar:https://docs.oracle.com/javase/specs/jls/se8/html/
jls-19.html
16/30
Compiling
So far, we have looked at:
I the scanner (lexical analysis)–tokenizes input
I the parser (syntactic analysis)–validates structure
I Next, we do semantic analysis: is the program meaningful?
EXAMPLE:
In Java,int i, i, i;
has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.
17/30
Compiling
So far, we have looked at:
I the scanner (lexical analysis)–tokenizes input
I the parser (syntactic analysis)–validates structure
I Next, we do semantic analysis: is the program meaningful?
EXAMPLE:
In Java,int i, i, i;
has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.
17/30
Compiling
So far, we have looked at:
I the scanner (lexical analysis)–tokenizes input
I the parser (syntactic analysis)–validates structure
I Next, we do semantic analysis: is the program meaningful?
EXAMPLE:
In Java,int i, i, i;
has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.
17/30
Semantic Analysis
I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.
I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.
18/30
Semantic Analysis
I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.
I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.
18/30
As the parser encounters names, it looks them up to see if they arealready declared; if not, it creates a table entry. (Some names arepre-declared as part of the language.)
19/30
20/30
21/30
22/30
Intermediate Code Generation
I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.
I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).
I A simple example is something called “three-address code.”
23/30
Intermediate Code Generation
I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.
I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).
I A simple example is something called “three-address code.”
23/30
Intermediate Code Generation
24/30
Why Intermediate Code Generation?
25/30
Why Intermediate Code Generation?
25/30
Optimization
26/30
Optimization
27/30
Optimization
Here is a list of the many, many kinds of optimizations:http://www.compileroptimizations.com/
(The examples show the effects on source code, but theoptimizations are usually made on the intermediate code.)
28/30
Code Generation
I It is usually very straightforward to generate machineinstructions from intermediate code, since the intermediatecode is simple.
I Some further machine-specific optimizations may take placeduring or after this stage.
29/30
Pipelining
I The steps in compilation don’t need to be done in wholephases, one after the other, but can be “pipelined”:
I int count = 1;
create tokens, pass to parser, generate some intermediate code
I j = j + count;
create tokens, pass to parser, generate some intermediate code... etc. ...
30/30