44
Programming Language Concepts Lexical and Syntactic Analysis Janyl Jumadinova 24 January, 2017

Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Programming Language ConceptsLexical and Syntactic Analysis

Janyl Jumadinova24 January, 2017

Page 2: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Most Important Steps in Compilation

I Optional Preprocessing

I Lexical analysis (scanning)

I Syntax analysis (parsing)

I Semantic analysis

I Intermediate code generation

I Optimization (usually machine-independent)

I Final code generation

I Optional final optimization

2/30

Page 3: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Lexical Analysis

3/30

Page 4: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Lexical Analysis

For each token type, give a description:

- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),

- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”

4/30

Page 5: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Lexical Analysis

For each token type, give a description:

- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”

4/30

Page 6: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Lexical Analysis

Lexical analysis produces a “token stream” in which the progam isreduced to a sequence of token types, each with its identifyingnumber and the actual string (in the program) corresponding to it.

5/30

Page 7: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

6/30

Page 8: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Syntactic Analysis

I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.

I Grammars are often specified in BNF notation (“Backus NaurForm”):

<item1> ::= valid replacements for <item1>

<item2> ::= valid replacements for <item2>

7/30

Page 9: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Syntactic Analysis

I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.

I Grammars are often specified in BNF notation (“Backus NaurForm”):

<item1> ::= valid replacements for <item1>

<item2> ::= valid replacements for <item2>

7/30

Page 10: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Syntactic Analysis

I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.

I Grammars are often specified in BNF notation (“Backus NaurForm”):

<item1> ::= valid replacements for <item1>

<item2> ::= valid replacements for <item2>

7/30

Page 11: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Syntactic Analysis

8/30

Page 12: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Page 13: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Page 14: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Page 15: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Page 16: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Page 17: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

Grammar

A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 2011020 can be parsed?

10/30

Page 18: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

Grammar

A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 2011020 can be parsed?

10/30

Page 19: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

Grammar

A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?

11/30

Page 20: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammars (Context-free Gramars)

Grammar

A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?

11/30

Page 21: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

12/30

Page 22: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Syntactic Analysis

I The process of verifying that a token stream represents a validapplication of the rules is called parsing.

I Using the BNF rules we can construct a parse tree:

13/30

Page 23: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Sample Parse Tree (portion)

14/30

Page 24: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Sample Parse Tree (failed)

15/30

Page 25: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Grammar for Java (version 8)

I Overview of notation used:https://docs.oracle.com/javase/specs/jls/se8/html/

jls-2.html

I The full syntax grammar:https://docs.oracle.com/javase/specs/jls/se8/html/

jls-19.html

16/30

Page 26: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Compiling

So far, we have looked at:

I the scanner (lexical analysis)–tokenizes input

I the parser (syntactic analysis)–validates structure

I Next, we do semantic analysis: is the program meaningful?

EXAMPLE:

In Java,int i, i, i;

has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.

17/30

Page 27: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Compiling

So far, we have looked at:

I the scanner (lexical analysis)–tokenizes input

I the parser (syntactic analysis)–validates structure

I Next, we do semantic analysis: is the program meaningful?

EXAMPLE:

In Java,int i, i, i;

has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.

17/30

Page 28: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Compiling

So far, we have looked at:

I the scanner (lexical analysis)–tokenizes input

I the parser (syntactic analysis)–validates structure

I Next, we do semantic analysis: is the program meaningful?

EXAMPLE:

In Java,int i, i, i;

has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.

17/30

Page 29: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Semantic Analysis

I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.

I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.

18/30

Page 30: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Semantic Analysis

I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.

I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.

18/30

Page 31: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

As the parser encounters names, it looks them up to see if they arealready declared; if not, it creates a table entry. (Some names arepre-declared as part of the language.)

19/30

Page 32: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

20/30

Page 33: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

21/30

Page 34: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

22/30

Page 35: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Intermediate Code Generation

I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.

I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).

I A simple example is something called “three-address code.”

23/30

Page 36: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Intermediate Code Generation

I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.

I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).

I A simple example is something called “three-address code.”

23/30

Page 37: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Intermediate Code Generation

24/30

Page 38: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Why Intermediate Code Generation?

25/30

Page 39: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Why Intermediate Code Generation?

25/30

Page 40: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Optimization

26/30

Page 41: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Optimization

27/30

Page 42: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Optimization

Here is a list of the many, many kinds of optimizations:http://www.compileroptimizations.com/

(The examples show the effects on source code, but theoptimizations are usually made on the intermediate code.)

28/30

Page 43: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Code Generation

I It is usually very straightforward to generate machineinstructions from intermediate code, since the intermediatecode is simple.

I Some further machine-specific optimizations may take placeduring or after this stage.

29/30

Page 44: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax

Pipelining

I The steps in compilation don’t need to be done in wholephases, one after the other, but can be “pipelined”:

I int count = 1;

create tokens, pass to parser, generate some intermediate code

I j = j + count;

create tokens, pass to parser, generate some intermediate code... etc. ...

30/30