175
Compiler Construction Mayer Goldberg \ Ben-Gurion University October 31, 2018 Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 1 / 175

Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Compiler Construction

Mayer Goldberg \ Ben-Gurion University

October 31, 2018

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 1 / 175

Page 2: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Chapter 2Goals

▶ The pipeline of the compiler▶ Introduction to syntactic analysis▶ Further steps in ocaml

Agenda▶ The pipeline

▶ Syntactic analysis▶ Semantic analysis▶ Code generation

▶ The compiler for the course▶ The language of S-expressions▶ More ocaml

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 2 / 175

Page 3: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Refresher

Last week, we discussed▶ The interpreter as an evaluation function▶ The compiler as a translator & optimizer▶ We explored the relations between interpretation & compilation

This was a rather high-level view of the areaWe now wish to consider compilation as a large software-project

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 3 / 175

Page 4: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Compilation as translation

A compiler translates between languages:▶ Understanding the syntax of the program

▶ What kinds of statements & expressions there are▶ What are the various parts of these statements & expressions▶ Are they syntactically correct

▶ Understanding the meaning of the program▶ Do the operations make sense?

▶ What are their types?▶ Are they used in accordance with their types?

▶ On what data is the program acting?▶ What is returned?

▶ Once we understand the syntax and meaning of a sentence, wecan render it in another language

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 4 / 175

Page 5: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Since the 1950’s, the standard architecture for compilers has been apipeline:

▶ Syntactic analysis▶ Scanning▶ Parsing

▶ Reading▶ Tag-Parsing

▶ Semantic analysis▶ Code generation

Scanner Reader Tag-Parser Semantic Analyser

Code Generator

chars tokens sexprs ASTs ASTsasm /

mach lang

Parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 5 / 175

Page 6: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

The stages in the compiler pipeline are distinguished by▶ Function: What they do▶ Dependencies: Which stages depend on which other▶ Complexity: How difficult it is to perform a stage

In programming languages:▶ Understanding syntax is relatively straightforward (unlike in

natural language)▶ Understanding meaning is way harder than understanding syntax▶ Meaning is built upon syntax (in natural languages, syntax &

meaning can be inter-dependent)▶ Code generation is relatively straightforward (template-based)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 6 / 175

Page 7: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

OptimizationsHow optimizations fit into the pipeline of the compiler:

▶ We distinguish [at least] two levels of optimizations:▶ High-level optimizations (closer to the source language) would

go into the semantic analysis phase▶ Low-level optimizations (closer to assembly language) would go

into the code generation phaseThis distinction can be fuzzy. Some make it fuzzier withintermediate-level optimizations

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 7 / 175

Page 8: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

An example of a high-level optimization

Suppose the compiler can know that the value of n is 0 whenreaching the following statement:

if (n == 0) {foo();

}else {goo(n);

}

Then an obvious optimization to perform would be to eliminate theif-statement with:

foo();

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 8 / 175

Page 9: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

How has the code improved:

Beforeif (n == 0) {foo();

}else {goo(n);

}

Afterfoo();

What was gained▶ The test during run-time has been eliminated▶ The code is shorter▶ Possibly lead to further, cascading optimizations

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 9 / 175

Page 10: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

An example of a low-level optimization

Before:

mov rax, 1mov rax, 2

After:

mov rax, 2

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 10 / 175

Page 11: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

How has the code improved:

Beforemov rax, 1mov rax, 2

Aftermov rax, 2

What was gained▶ Saved 1 cycle▶ Made the code smaller▶ If this code appears within a loop, gains shall be multiplied…

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 11 / 175

Page 12: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Basic concepts▶ Concrete syntax▶ Abstract syntax▶ Abstract Syntax-Tree (AST)▶ Token▶ Delimiter▶ Whitespace

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 12 / 175

Page 13: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Concrete syntax (continued)The concrete syntax of a computer program is a textual stream ofcharacters:

▶ It’s one-dimensional▶ Lacking in structure

▶ No nesting▶ No sub-expressions

▶ Difficult to work with▶ Difficult to access parts▶ Difficult to determine correctness

▶ Contains redundancies (spaces, comments, etc)(define fact (lambda (n) (if (zero? n) 1(* n (fact (- n1))))))

Think of▶ A text file▶ Characters typed at the prompt

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 13 / 175

Page 14: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Basic concepts🗸 Concrete syntax▶ Abstract syntax▶ Abstract Syntax-Tree (AST)▶ Token▶ Delimiter▶ Whitespace

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 14 / 175

Page 15: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Abstract syntax

The abstract syntax of a computer program is a tree-likedata-structure. It is:

▶ Multi-dimensional▶ Conveys structure

▶ Nested▶ Recursive (following the inductive definition of the grammar)

▶ Easier to work with than the concrete syntax▶ Easier to access parts▶ Easier to verify correctness▶ Some syntactic correctness issues have already been decided

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 15 / 175

Page 16: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Basic concepts🗸 Concrete syntax🗸 Abstract syntax▶ Abstract Syntax-Tree (AST)▶ Token▶ Delimiter▶ Whitespace

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 16 / 175

Page 17: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Abstract Syntax-Tree (AST)

Notice▶ The AST is a tree▶ No text▶ No parenthesis▶ No spaces, tabs, newlines▶ The structure is evident▶ Easy to find

sub-expressions▶ Easy to determine

correctness▶ Easier to analyze,

transform, and compile

The AST of fact

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 17 / 175

Page 18: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Concrete vs Abstract Syntax▶ Parsing: going from concrete syntax to abstract syntax▶ Parser: the tool that performs parsing

Concrete Syntax▶ Lacks structure▶ Prone to errors▶ Hard to delimit

sub-expressions▶ Inefficient to work with▶ Concrete Syntax can be

avoided▶ Visual languages▶ Structure/syntax editors

Abstract Syntax▶ Has structure▶ Many kinds of errors are

avoided▶ Sub-Expressions are readily

accessible▶ Efficient to work with

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 18 / 175

Page 19: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Basic concepts🗸 Concrete syntax🗸 Abstract syntax🗸 Abstract Syntax-Tree (AST)▶ Token▶ Delimiter▶ Whitespace

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 19 / 175

Page 20: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Tokens

▶ The smallest, meaningful, lexical unit in a language▶ Described using regular expressions▶ Identified using DFA (a very simple model of computation)▶ Examples

▶ Numbers▶ [Non-nested] Strings▶ Names (variables, functions)▶ Punctuation

▶ Cannot handle nesting of any kind:▶ Parenthesized expressions▶ Nested comments▶ etc.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 20 / 175

Page 21: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Tokens

▶ Scanning: going from characters into tokens▶ Scanner: the tool that performs scanning▶ Scanner generator: the tool that takes definitions for tokens,

using regular expressions (and callback functions), and returns ascanner

▶ Examples of scanner-generators: lex, flex

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 21 / 175

Page 22: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Basic concepts🗸 Concrete syntax🗸 Abstract syntax🗸 Abstract Syntax-Tree (AST)🗸 Token▶ Delimiter▶ Whitespace

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 22 / 175

Page 23: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Delimiters

▶ Delimiters are characters that separate tokens▶ In most languages spaces, parentheses, commas, semicolons,

etc., are all delimiters▶ Some tokens must be separated by delimiters

▶ Two consecutive numbers, two consecutive symbols, etc.▶ Some tokens do not need to be separated by delimiters

▶ Two consecutive strings, an open parenthesis followed by anumber, etc.

▶ Delimiters are language-dependent

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 23 / 175

Page 24: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Basic concepts🗸 Concrete syntax🗸 Abstract syntax🗸 Abstract Syntax-Tree (AST)🗸 Token🗸 Delimiter▶ Whitespace

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 24 / 175

Page 25: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Whitespace

▶ Whitespace refers to characters that▶ Have no graphical representation▶ Occur between tokens

▶ Spaces within strings are not whitespaces…▶ Serve no syntactic purpose other than as delimiters and for

indentation▶ Whitespace is language-dependent

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 25 / 175

Page 26: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Delimiters in various languages

C & SchemeSpaces, tab, newlines, carriage returns, form feeds are examples ofwhitespaces

JavaLiteral newline characters may not occur inside a literal string (mustuse \n). Otherwise, similar to C & Scheme.

PythonLeading tabs are not whitespaces because they have a clear syntacticfunction: They denote nesting level.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 26 / 175

Page 27: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Concrete vs Abstract syntax

Artifacts of the Concrete Syntax▶ Delimiters & whitespaces▶ Parentheses, brackets, braces, and other grouping and nesting

mechanisms (e.g., begin...end)Re-examine the concrete and abstract syntax for the factorialfunction, and notice what’s gone!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 27 / 175

Page 28: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Concrete vs Abstract syntax (continued)

The concrete syntax(define fact (lambda (n)(if (zero? n) 1(* n (fact (- n1))))))

The abstract syntax

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 28 / 175

Page 29: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler

Basic concepts🗸 Concrete syntax🗸 Abstract syntax🗸 Abstract Syntax-Tree (AST)🗸 Token🗸 Delimiter🗸 Whitespace

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 29 / 175

Page 30: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

More on parsing

To parse computer programs in a given language, we rely on:▶ Grammars with which to express the syntax of the language

▶ There are different kinds of grammars (CFG, CSG, two-level,etc)

▶ There are different languages for expressing syntax in agrammar (e.g., BNF)

▶ Algorithms for parsing programs as per kind of grammar▶ Techniques (e.g., parsing combinators, DCGs)

Parser generator: Takes a description of the grammar for a language,and generates a parser. For example, yacc, bison, nearly, etc.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 30 / 175

Page 31: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)

Scanning▶ Going from characters to tokens▶ Identifying & grouping characters into tokens for words,

numbers, strings, etc.▶ Parsing over tokens is more efficient than parsing over

characters☞ As the parser examines various ways to parse the code, the

parser can avoid re-identifying and re-building complex tokenssuch as numbers, strings, etc

Scanner Reader Tag-Parser Semantic Analyser

Code Generator

chars tokens sexprs ASTs ASTsasm /

mach lang

Parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 31 / 175

Page 32: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)Reading

▶ In LISP/Prolog, the parser is split into two components:▶ The reader, or the parser for the data language▶ The tag-parser, or the parser for the source code

▶ In LISP/Scheme/Racket/Clojure/etc, the abstract syntax forthe data is the concrete syntax for the code

▶ In Prolog, the abstract syntax for the data is the abstract syntaxfor the code

▶ Prolog is the programming language with the most powerfulcapabilities of reflection, i.e., code examining and working withitself.

Scanner Reader Tag-Parser Semantic Analyser

Code Generator

chars tokens sexprs ASTs ASTsasm /

mach lang

Parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 32 / 175

Page 33: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)Reading — Summary

▶ In programming languages in which the syntax of code is not apart of the syntax of data, concrete syntax is given as a streamof characters

▶ In programming languages in which the syntax of code is part ofthe syntax of data, things are a bit more complex:

▶ The concrete syntax of data is a stream of characters▶ The concrete language of code is the abstract syntax of the

data▶ In Scheme, the language of data is called S-expressions (more on

this, later)

Scanner Reader Tag-Parser Semantic Analyser

Code Generator

chars tokens sexprs ASTs ASTsasm /

mach lang

Parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 33 / 175

Page 34: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)

Tag-Parsing▶ The tag-parser takes sexprs and returns [ASTs for] exprs▶ Languages other than from the LISP & Prolog families do not

split parsing into a reader & tag-parser▶ In such languages, parsing goes directly from tokens to [ASTs

for] expressions

☞ Every valid program ”used to be” [i.e., before tag-parsing] avalid sexpr

☞ Not every valid sexpr is a valid program!

Scanner Reader Tag-Parser Semantic Analyser

Code Generator

chars tokens sexprs ASTs ASTsasm /

mach lang

Parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 34 / 175

Page 35: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)

QuestionA parser should:👎 Perform optimizations👎 Evaluate expressions👎 Raise type-mismatch errors👎 Find potential runtime errors (null-pointer dereferences,

array-index errors, etc.)👍 Validate the structure of input programs against a syntactic

specification

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 35 / 175

Page 36: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)

QuestionUsing an AST, it is impossible to:👎 Perform code reformatting/beautification/style-checking👎 Perform optimizations👎 Output a new program which is semantically equivalent to the

input program (code generation)👎 Refactor the input program👍 Generate a list of all the comments in the code

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 36 / 175

Page 37: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)

Semantic Analysis▶ Annotate the ASTs▶ Compute addresses▶ Annotate tail-calls▶ Type-check code▶ Perform optimizations

Scanner Reader Tag-Parser SemanticAnalyser

Code Generator

chars tokens sexprs ASTs ASTsasm /

mach lang

Parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 37 / 175

Page 38: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The pipeline of the compiler (continued)

Code Generation▶ Generate a stream of instructions in

▶ assembly language▶ machine language

▶ Build executable▶ some other target language…

▶ Perform low-level optimizations

Scanner Reader Tag-Parser Semantic Analyser

CodeGenerator

chars tokens sexprs ASTs ASTsasm /

mach lang

Parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 38 / 175

Page 39: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The compiler for the course

Our compiler project▶ Written in ocaml▶ Support a subset of Scheme + extensions▶ Support two, simple optimizations▶ Compile to x86/64▶ Run on linux

What our project shall lack▶ Support for the full language of Scheme▶ Support for garbage collection▶ Self-compilation

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 39 / 175

Page 40: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions

▶ We’re going to learn about syntax by studying the syntax ofScheme

▶ After all, we’re writing a Scheme compiler…▶ It’s relatively simple, compared to the syntax of C, Java,

Python, and many other languages▶ It comes with some interesting twists

▶ Scheme comes with two languages:▶ A language for code▶ A language for data

and there’s a tricky relationship between the two.▶ The key to understanding the syntax of Scheme, is to think

about data

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 40 / 175

Page 41: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Language of Data

What is a language of data? — A language in which to▶ Describe arbitrarily-complex data

▶ Possibly multi-dimensional, deeply nested▶ Polymorphic

▶ Access components easily and efficiently

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 41 / 175

Page 42: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Language of Data (continued)

Today many languages of data are known:▶ S-expressions (the first: 1959)▶ Functors (1972)▶ Datalog (1977)▶ SGML (1986)▶ MS DDE (1987)▶ CORBA (1991)▶ MS COM (1993)▶ MS DCOM (1996)▶ XML (1996)▶ JSON (2001)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 42 / 175

Page 43: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Language of Data (continued)

What makes S-expressions and Functors unique?▶ They’re the first… 😉▶ They’re supported natively, as part of specific programming

languages▶ S-expressions are supported by LISP-based languages, including

Scheme & Racket▶ Functors are supported by Prolog-based languages

▶ The language of programming is a [strict] subset of the languageof data

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 43 / 175

Page 44: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Language of Data (continued)Think for a moment about the language of XML:<something>...</something>, etc

▶ It’s not supported natively by any programming language▶ Most modern languages (Java, Python, etc) support it via

libraries▶ No programming language is written in XML:<package name="Foo"><class name="Foo"><method name="goo">

...</method>

</class></package>This would be cumbersome, and weird!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 44 / 175

Page 45: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Language of Data (continued)

However, if some programming language both▶ Supported XML as its data language▶ Were itself written in XML

Then a parser for XML could also read programs written in thatlanguage:

▶ Writing interpreters, compilers, and other language-tools wouldhave been much simpler!

▶ Reflection (code examining code) would be simple

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 45 / 175

Page 46: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Language of Data (continued)

This is the case with S-expressions:▶ They are the data language for LISP-based languages, including

Scheme▶ LISP-based languages are written using S-expressions▶ Writing interpreters and compilers in LISP-based languages is

much simpler than in other languages▶ Computational reflection was invented in LISP!▶ This is the real reason behind all these parentheses in Scheme:

▶ A very simple language▶ Supports core types: pairs, vectors, symbols, strings, numbers,

booleans, the empty list, etc.▶ A syntactic compromise that is great for expressing both code

and data

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 46 / 175

Page 47: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Back to S-expressions▶ S-expressions were invented along with LISP, in 1959▶ S-expressions stand for Symbolic Expressions▶ The term is intended to distinguish itself from numerical

expressions▶ Before LISP (and long after it was invented), most computation

concerned itself with numbers▶ Computers languages were great at ”crunching numbers”, but

working with non-numeric data types was difficult▶ String libraries were non-standard and uncommon▶ Polymorphic data was unheard of▶ Nested data structured needed to be implemented from scratch,

usually with arrays of characters and/or integers…

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 47 / 175

Page 48: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Back to S-expressionsThen S-expressions were invented as part of a very dynamicprogramming language (LISP):

▶ Working with data structures became considerably simpler▶ Trivially allocated (no pointer arithmetic)▶ Polymorphic (lists of lists of numbers and strings and vectors of

booleans and…)▶ Easy to access sub-structures (no pointer arithmetic)▶ Easy to modify (in an easygoing, functional style)▶ Easy to redefine▶ Automatically deallocated (garbage collection)

▶ Treating code as data became considerably simpler

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 48 / 175

Page 49: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Several fields were invented using LISP and its tools:▶ Symbolic Mathematics (Macsyma, a precursor to Wolfram

Mathematica)▶ Artificial Intelligence▶ Computer adventure game generation languages (MDL, ZIL)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 49 / 175

Page 50: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Definition: S-expressionsThe language is made up of

▶ The empty list: ()▶ Booleans: #f, #t▶ Characters: #\a, #\Z, #\space, #\return, #\x05d0, etc▶ Strings: "abc", "Hello\nWorld\t\x05d0;hi!", etc▶ Numbers: -23, #x41, 2/3, 2-3i, 2.34, -2.34+3.5i▶ Symbols: abc, lambda, define, fact, list->string▶ Pairs: (a . b), (a b c), (a (2 . #f) "moshe")▶ Vectors: #(), #(a b ((1 . 2) #f) "moshe")

Traditionally, non-pairs are known as atoms.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 50 / 175

Page 51: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Proper & improper lists▶ The name LISP comes from LISt Processing.▶ In fact, LISP has no direct support for lists.▶ LISP has ordered pairs

▶ Ordered pairs are created using cons▶ The first and second projections over ordered pairs are car and

cdr. For all x, y:▶ (car (cons x y)) ≡ x▶ (cdr (cons x y)) ≡ y▶ The ordered pair of x and y can be written as (x . y)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 51 / 175

Page 52: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

The dot rulesTwo rules govern how ordered pairs are printed:

▶ Rule 1: For any E, the ordered pair (E . ()) is printed as (E),which looks like a list of 1 element.

▶ Rule 2: For any E1, E2, …, the ordered pair (E1 . (E2 — )) isprinted as (E1 E2 — )

▶ These rules just effect how pairs are printed▶ These rules give us a canonical representation for pairs

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 52 / 175

Page 53: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Example▶ The pair (a . (b . c)) is printed as (a b . c)

SYMBOL

a

SYMBOL

b

SYMBOL

c

PAIR

CAR CDR

PAIR

CAR CDR

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 53 / 175

Page 54: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)Example

▶ The pair ((a . (b . ())) . ((c . (d . ())))) isprinted as ((a b) (c d))

SYMBOL

a

SYMBOL

bNIL

PAIR

CAR CDR

PAIR

CAR CDR

SYMBOL

c

SYMBOL

dNIL

PAIR

CAR CDR

PAIR

CAR CDRNIL

PAIR

CAR CDR

PAIR

CAR CDR

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 54 / 175

Page 55: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

▶ Lists in Scheme can come in two forms, proper lists andimproper lists.

▶ When we just speak of lists, we usually mean proper lists.▶ Most of the list processing functions (length, map, etc) all work

with proper lists.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 55 / 175

Page 56: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Proper lists▶ Proper lists are nested ordered pairs the rightmost cdr of which

is the empty list (aka nil)▶ Testings for pairs is cheap, and is done by means of the builtin

predicate pair?▶ Testing for lists is expensive, since it traverses nested, ordered

pairs, until it reaches their rightmost cdr. This is done bymeans of the builtin predicate list?

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 56 / 175

Page 57: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Proper listsHere’s a definition for list?:(define list?(lambda (e)

(or (null? e)(and (pair? e)

(list? (cdr e))))))

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 57 / 175

Page 58: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Improper lists▶ Pairs that are not proper lists are improper lists.▶ Improper lists end with a rightmost cdr that is not nil▶ List-processing procedures such as length, map, etc., do not

work over improper lists▶ There is no builtin procedure for testing improper lists, but it

could be written as follows:

(define improper-list?(lambda (e)

(and (pair? e)(not (list? (cdr e))))))

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 58 / 175

Page 59: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Self-evaluating formsBooleans, numbers, characters, strings are self-evaluating forms. Youcan evaluate them directly at the prompt:

> 123123> "abc""abc"> #t#t> #\m#\m

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 59 / 175

Page 60: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions (continued)

Other formsThe empty list, pairs, and vectors cannot be evaluated directly at theprompt:

▶ Entering an empty list or a vector or an improper list at theprompt generates a run-time error.

▶ Entering a symbol at the prompt causes Scheme to attempt toevaluate a variable by the same name

▶ Entering a proper list, that is not the empty list, at the promptcauses Scheme to attempt to evaluate an application:> (a b c)

Exception: variable b is not boundType (debug) to enter the debugger.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 60 / 175

Page 61: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

To evaluate S-expressions that are not self-evaluating, we must usethe form quote:

▶ The special form quote can be written in two ways:▶ (quote <sexpr>)▶ '<sexpr>

Both forms are equivalent▶ When you type abc at the Scheme prompt, you’re evaluating

the variable abc▶ When you type 'abc at the Scheme prompt, you’re evaluating

the literal symbol abc▶ The value of the literal symbol abc is just itself, which is why

when you type 'abc at the Scheme prompt, you get back abc

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 61 / 175

Page 62: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

▶ When you type () at the Scheme prompt, you’re evaluating anapplication with no function and no arguments! This is asyntax-error!

▶ When you type '() at the Scheme prompt, you’re evaluating aliteral empty list

▶ The value of the literal empty list is just itself, which is whywhen you type '() at the Scheme prompt, you get back ()

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 62 / 175

Page 63: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends▶ When you type (a b c) at the Scheme prompt, you’re

evaluating the application of the procedure a to the arguments band c, which are variables

▶ When you type '(a b c) at the Scheme prompt, you’reevaluating the literal list (a b c)

▶ The value of the literal list (a b c) is just (a b c), which iswhy when you type '(a b c) at the Scheme prompt, you getback (a b c).

▶ Quoting a self-evaluating S-expression is possible, andredundant:> '22> (+ '2 '3)5

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 63 / 175

Page 64: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

So what does quote do?▶ The quote form does nothing

▶ It is not a procedure▶ It doesn’t take an argument▶ It delimits a constant, literal S-expressions

▶ The syntactic function of quote in Scheme is the same as thesyntactic function of braces { ... } in C in defining literaldata:const int A[] = {4, 9, 6, 3, 5, 1};

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 64 / 175

Page 65: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Meet quasiquote▶ Simlarly to quote, the form quasiquote can be written in two

ways:▶ (quasiquote <sexpr>)▶ `<sexpr>

▶ quasiquote is also used to define data:▶ `abc is the same as 'abc▶ `(a b c) is the same as '(a b c)

▶ But quasiquote has two neat tricks!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 65 / 175

Page 66: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Meet quasiquote▶ The following two forms may occur within aquasiquote-expression:

▶ The unquote form:▶ (unquote <sexpr>)▶ ,<sexpr>

▶ The unquote-splicing form:▶ (unquote-splicing <sexpr>)▶ ,@<sexpr>

▶ Both unquote & unquote-splicing are used to mix indynamic data into the data defined with quasiquote

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 66 / 175

Page 67: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Meet quasiquote> '(a (+ 1 2 3) b)(a (+ 1 2 3) b)> '(a ,(+ 1 2 3) b)(a ,(+ 1 2 3) b)> `(a (+ 1 2 3) b)(a (+ 1 2 3) b)> `(a ,(+ 1 2 3) b)(a 6 b)> `(a ,(append '(x y) '(z w)) b)(a (x y z w) b)> `(a ,@(append '(x y) '(z w)) b)(a x y z w b)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 67 / 175

Page 68: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Meet quasiquote▶ The expression `(a ,(append '(x y) '(z w)) b) is

equivalent to (cons 'a (cons (append '(x y) '(z w))'(b)))

▶ The expression `(a ,@(append '(x y) '(z w)) b) isequivalent to (cons 'a (append (append '(x y) '(z w))'(b)))

▶ The difference between unquote & unquote-splicing is that▶ unquote mixes in an expression using cons▶ unquote-splicing mixes in an expression using append

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 68 / 175

Page 69: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Meet quasiquote▶ Together, quasiquote, unquote, & unquote-splicing are

known as the quasiquote mechanism or the backquotemechanism

▶ The quasiquote mechanism allows us to create data bytemplate, that is, by specifying the shape of the data

▶ In Scheme, convenient ways to create data translateimmediately into convenient ways to create code

▶ Therefore we expect the quasiquote mechanism to have usefulapplications within programming languages

▶ We can turn code that computes something into code thatshows us a computation…

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 69 / 175

Page 70: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Consider the familiar factorial function:(define fact

(lambda (n)(if (zero? n)

1(* n (fact (- n 1))))))

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 70 / 175

Page 71: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

We use the quasiquote mechanism to convert the application (* n(fact (- n 1))) into code that describes what factorial does:(define fact

(lambda (n)(if (zero? n)

1`(* ,n ,(fact (- n 1))))))

Running (fact 5) now gives:

> (fact 5)(* 5 (* 4 (* 3 (* 2 (* 1 1)))))

As you can see, factorial now prints a trace of the computation.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 71 / 175

Page 72: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

We are now going to use the quasiquote mechanism to get Schemeto teach us about the structure of S-expressions.Consider the following code:(define foo

(lambda (e)(cond ((pair? e)

(cons (foo (car e))(foo (cdr e))))

((or (null? e) (symbol? e)) e)(else e))))

What does this program do?

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 72 / 175

Page 73: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Let’s call foo with some arguments:

> (foo 'a)a> (foo 123)123> (foo '())()> (foo '(a b c))(a b c)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 73 / 175

Page 74: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friendsLooking over the code again(define foo

(lambda (e)(cond ((pair? e)

(cons (foo (car e))(foo (cdr e))))

((or (null? e) (symbol? e)) e)(else e))))

we notice that:▶ The 2nd and 3rd ribs of the cond overlap [we could have

removed the 2nd]▶ All atoms are left unchanged▶ All pairs are duplicated, while recursing over the car and cdr of

the pairSo foo does nothing, though it does it recursively! ☺

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 74 / 175

Page 75: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

We now use the quasiquote mechanism to cause foo to generate atrace:(define foo

(lambda (e)(cond ((pair? e)

`(cons ,(foo (car e)),(foo (cdr e))))

((or (null? e) (symbol? e)) `',e)(else e))))

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 75 / 175

Page 76: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friendsRunning foo now gives us some interesting data:

> (foo 'a)'a> (foo '(a b c))(cons 'a (cons 'b (cons 'c '())))> (foo '(a 1 b 2))(cons 'a (cons 1 (cons 'b (cons 2 '()))))> (foo 123)123> (foo '((a b) (c d)))(cons(cons 'a (cons 'b '()))(cons (cons 'c (cons 'd '())) '()))

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 76 / 175

Page 77: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

▶ Using the quasiquote mechanism, we got foo to describe howS-expressions are created using the most basic API.

▶ We should really add support for proper lists and vectors!▶ In fact, the name describe is far more appropriate than foo…

Let’s rewrite foo…

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 77 / 175

Page 78: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

(define describe(lambda (e)

(cond ((list? e)`(list ,@(map describe e)))

((pair? e)`(cons ,(describe (car e))

,(describe (cdr e))))((vector? e)`(vector

,@(map describe(vector->list e))))

((or (null? e) (symbol? e)) `',e)(else e))))

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 78 / 175

Page 79: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Running describe on various S-expressions is very instructive:

> (describe '(a b c))(list 'a 'b 'c)> (describe '#(a b c))(vector 'a 'b 'c)> (describe '(a b . c))(cons 'a (cons 'b 'c))> (describe ''a)(list 'quote 'a)

Wait! What’s with the last example?!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 79 / 175

Page 80: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Recall what we said about quote, quasiquote, unquote, &unquote-splicing:

▶ (quote <sexpr>) ≡ '<sexpr>▶ (quasiquote <sexpr>) ≡ `<sexpr>▶ (unquote <sexpr>) ≡ ,<sexpr>▶ (unquote-splicing <sexpr>) ≡ ,@<sexpr>

Now we get to see this happen…

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 80 / 175

Page 81: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Now we get to see this happen:

> (describe ''<sexpr>)(list 'quote '<sexpr>)> (describe '`<sexpr>)(list 'quasiquote '<sexpr>)> (describe ',<sexpr>)(list 'unquote '<sexpr>)> (describe ',@<sexpr>)(list 'unquote-splicing '<sexpr>)

Rule: Every Scheme expression used to be an S-expression when itwas little!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 81 / 175

Page 82: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

QuestionWhat is (length '''''''''''''''''moshe) ?👎 17👎 16👎 Generates an error message!👎 1👍 2

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 82 / 175

Page 83: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends

Explanation(length '''''''''''''''''moshe) is the same as (length'(quote <something>)), where <something> is'''''''''''''''moshe, but that really doesn’t matter! We are stillcomputing the length of a list of size 2:

▶ The first element of the list is the symbol quote▶ The second element of the list is '''''''''''''''moshe

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 83 / 175

Page 84: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

S-expressions: quote & friends (continued)

QuestionThe structure of the S-expression ''a in Scheme is:👎 Just the symbol a👎 The proper list (quote . (a . ()))👎 The proper list (quote . (quote . (a . ())))👎 An invalid S-expression👍 The nested proper list (quote . ((quote . (a . ())) .

()))

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 84 / 175

Page 85: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Further reading

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 85 / 175

Page 86: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Chapter 2

Goals🗸 The pipeline of the compiler🗸 Introduction to syntactic analysis☞ Further steps in ocaml

Agenda☞ Ocaml

▶ Types▶ References▶ Modules & signatures▶ Functional programming in ocaml

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 86 / 175

Page 87: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Introduction to ocaml (2)

Still need to coverTo program in ocaml effectively in this course , we still need to learnsome additional topics:

▶ Defining new data types▶ Assignments, side-effects,

What we shan’t coverObject Orientation: Once you’re comfortable with the ocaml, youmight like to pick up the object-oriented layer. As object-orientationgoes, you should find it to be sophisticated and expressive.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 87 / 175

Page 88: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Types

New types are defined using the type statement:

type fraction = {numerator : int; denominator : int};;

The above statement defines a new type fraction as a recordconsisting of two fields: numerator & denominator, both of typeint.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 88 / 175

Page 89: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Types (continued)

Once fraction has been defined, the underlying system recognizesit for all records with these fields & types:

# {numerator = 2; denominator = 3};;- : fraction = {numerator = 2; denominator = 3}# {denominator = 3; numerator = 2};;- : fraction = {numerator = 2; denominator = 3}

Notice that the order of the fields in a record is immaterial, becausethe fields are accessed through their names, which are convertedconsistently into offsets.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 89 / 175

Page 90: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Types (continued)The type-inference engine in ocaml will correctly infer newly-definedtypes:

let add_fractions f1 f2 =match f1, f2 with| {numerator = n1; denominator = d1},{numerator = n2; denominator = d2} ->{numerator = n1 * d2 + n2 * d1;denominator = d1 * d2};;

And of course:

# add_fractions{numerator = 2; denominator = 3}{numerator = 4; denominator = 5};;

- : fraction = {numerator = 22; denominator = 15}

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 90 / 175

Page 91: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Types (continued)

We can define disjoint types as follows:

type number =| Int of int| Frac of fraction| Float of float;;

Think of the | as disjunction. The initial | is optional in ocaml.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 91 / 175

Page 92: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Types (continued)

We can now define a list of numbers as follows:

# [Int 3;Frac {numerator = 3;

denominator = 4};Float (4.0 *. atan(1.0))];;

- : number list =[Int 3;Frac {numerator = 3;

denominator = 4};Float 3.14159265358979312]

Notice that ocaml had no trouble identifying each of the threeelements of the list as belonging to type number.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 92 / 175

Page 93: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Types (continued)

Working with disjoint typesUse match to dispatch over the corresponding type constructor, andmake sure you handle each and every possibility!

let number_to_string x =match x with| Int n -> Format.sprintf "%d" n| Frac {numerator = num; denominator = den} ->

Format.sprintf "%d/%d" num den| Float x -> Format.sprintf "%f" x;;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 93 / 175

Page 94: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Types (continued)

Working with disjoint types (continued)And here’s how it looks:

# number_to_string (Int 234);;- : string = "234"# number_to_string (Frac {numerator = 2; denominator = 5});;- : string = "2/5"# number_to_string (Float 234.234);;- : string = "234.234000"

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 94 / 175

Page 95: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

References

Let us take another look at the record-type. Recall the definition offraction:

# type fraction = {numerator : int; denominator : int};;type fraction = { numerator : int; denominator : int; }

In the function add_fractions we used pattern-matching to accessthe record-fields.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 95 / 175

Page 96: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

References (continued)

Ocaml lets you access fields directing, using the dot-notation that isfamiliar from object-oriented programming:

# {numerator = 3; denominator = 5}.numerator;;- : int = 3# {numerator = 3; denominator = 5}.denominator;;- : int = 5

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 96 / 175

Page 97: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

References (continued)

Ocaml offers a special record-type known as a reference.▶ References are derived types. For any type α, we can have a

type α ref.▶ References are records with a single field contents▶ References have a special syntax ! to dereference the field:

# {contents = 1234};;- : int ref = {contents = 1234}# {contents = 1234}.contents;;- : int = 1234# ! {contents = 1234};;- : int = 1234

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 97 / 175

Page 98: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

References (continued)▶ References have a special syntax := for assignment▶ This is how assignments are managed in ocaml

# let x = ref 1234;;val x : int ref = {contents = 1234}# x;;- : int ref = {contents = 1234}# !x;;- : int = 1234# x := 4567;;- : unit = ()# x;;- : int ref = {contents = 4567}# !x;;- : int = 4567

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 98 / 175

Page 99: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

References (continued)

▶ It is not possible to perform assignments on variables▶ It is only possible to change the fields of reference types

# let x = "abc";;val x : string = "abc"# x := "def";;Characters 0-1:x := "def";;^

Error: This expression has type stringbut an expression was expected of type

'a ref

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 99 / 175

Page 100: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

References (continued)

▶ You can define a reference type of any other type, includingother reference types:

# let x = ref (ref 1234);;val x : int ref ref = {contents = {contents = 1234}}# x := ref 5678;;- : unit = ()# x;;- : int ref ref = {contents = {contents = 5678}}# !x := 9876;;- : unit = ()# x;;- : int ref ref = {contents = {contents = 9876}}

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 100 / 175

Page 101: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Modules, signatures, functors

Modules▶ A module is a way of packaging functions, classes, variables, &

types▶ A signature is the type of a module

▶ Visibility of a module can be restricted through the signature▶ Functors are functions from functors/modules to

functors/modules

Goals▶ Learn to work with existing modules▶ Learn to write your own modules

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 101 / 175

Page 102: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Modules, signatures, functors (continued)We define the function hyp to compute the hypotenuse of a triangle,given two sides and the angle between them (cosine law). We usethe auxiliary function square:

# module M = structlet square x = x *. xlet hyp a b theta =sqrt((square a) +. (square b) -.

2.0 *. a *. b *. (cos theta))end;;

module M : sigval square : float -> floatval hyp : float -> float -> float -> float

end

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 102 / 175

Page 103: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Modules, signatures, functors (continued)

Both M.square and M.hyp are visible:

# M.hyp;;- : float -> float -> float -> float = <fun># M.square;;- : float -> float = <fun># M.square 2.0;;- : float = 4.# M.hyp 3.5 5.6 0.645771823239;;- : float = 3.50763282088818817

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 103 / 175

Page 104: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Modules, signatures, functors (continued)We define the module type based on the returned signature of M,but with the square function removed:# module type SigHyp = sig

val hyp : float -> float -> float -> floatend;;

module type SigHyp = sigval hyp : float -> float -> float -> float

end# module M : SigHyp = struct

let square x = x *. xlet hyp a b theta =sqrt((square a) +. (square b) -.

2.0 *. a *. b *. (cos theta))end;;

module M : SigHypMayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 104 / 175

Page 105: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Modules, signatures, functors (continued)Visibility is now restricted:

▶ M.hyp is visible from outside M▶ M.square is not visible from outside M▶ Functions visible from outside may use functions visible from

inside

# M.hyp;;- : float -> float -> float -> float = <fun># M.square;;Characters 0-8:M.square;;^^^^^^^^

Error: Unbound value M.square# M.hyp 3.5 5.6 0.645771823239;;- : float = 3.50763282088818817

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 105 / 175

Page 106: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Modules, signatures, functors (continued)

Summary▶ Modules & signatures are the way to package functions &

control visibility▶ Convenient, super-efficient, safe▶ No need to use local, nested functions to manage visibility▶ Always use signatures to control visibility!

Learn on your own▶ Modules can contain types too, and be used to parameterize

code with types▶ Simpler & better than generics & templates

▶ Functors map modules/functors =⇒ modules/functors

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 106 / 175

Page 107: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Further reading

🕮 The Objective Caml Programming Language, Chapter 12🔗 An online tutorial on ocaml modules

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 107 / 175

Page 108: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Techniques

Dozens of parsing algorithms are known:▶ Parsing algorithms are tailored to a specific kind of grammar

▶ Different kinds of grammars can be parsed by differentalgorithms

▶ Different kinds of grammars have different levels of complexityon the Chomsky Hierarchy

▶ Most programming languages can be described usingcontext-free grammars

▶ Some older languages can only be described usingcontext-sensitive grammars

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 108 / 175

Page 109: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Techniques (continued)

Context-free Grammars (CFGs)A CFG is a structure of the form G = ⟨V,Σ,R, S⟩:

▶ V is a set of non-terminals▶ Σ is a set of terminals, or tokens▶ R is a relation in V × (V ∪ Σ)∗

▶ Members of R are called production rules or rewrite rules▶ S is the an initial non-terminal

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 109 / 175

Page 110: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Techniques (continued)

Context-free Grammars (conveniences)▶ We abbreviate the two productions ⟨A,X⟩ , ⟨A,Y⟩ ∈ R with

⟨A,X | Y⟩ (disjunction)▶ We abbreviate the three productions ⟨A,X⟩ , ⟨X, ε⟩ , ⟨X,BX⟩ ∈ R,

where X has no other productions, with ⟨A,B∗⟩, (Kleene-star)▶ We abbreviate the three productions

⟨A,X⟩ , ⟨X,B⟩ , ⟨X,BX⟩ ∈ R, where X has no other productions,with ⟨A,B+⟩, (Kleene-plus)

▶ We abbreviate the two productions ⟨A, ε⟩ , ⟨A,B⟩ ∈ R, with⟨A,B?

⟩(maybe)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 110 / 175

Page 111: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Techniques (continued)

The two basic approaches to parsing CFG are top-down & bottom-up:

Top-down algorithms▶ Start with the initial non-terminal▶ Rewrite the LHS of a non-terminal with its RHS, matching the

input stream of tokens▶ Keep rewriting until the entire input stream is matched

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 111 / 175

Page 112: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Techniques (continued)

The two basic approaches to parsing CFG are top-down & bottom-up:

Bottom-up algorithms▶ Start with the input stream of tokens▶ Find a rewrite rule where the RHS matches sequences in the

input, and rewrite them to the LHS, reducing several items to asingle non-terminal

▶ Keep rewriting until the entire input stream has been reduced tothe initial non-terminal

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 112 / 175

Page 113: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Techniques (continued)

How most parsing algorithms are used▶ Describe the grammar of the language using a DSL for some

restricted CFG▶ Example: Backus-Naur Form (BNF)

▶ Associate actions with each production rule:▶ How to build the AST when a specific rule is matched

▶ A parser generator (e.g., yacc, bison, antlr, etc) compiles thegrammar:

▶ Performing various optimizations▶ Generating code in some language (C, Java, ocaml, etc)▶ This code is the parser

▶ Calling the parser on some input returns an AST

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 113 / 175

Page 114: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Techniques (continued)

Goals of parsing algorithms▶ Minimal restrictions on the grammar▶ Avoid backtracking as much as possible▶ Maximum optimizations of the parser

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 114 / 175

Page 115: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing Combinators

A technique for embedding a specification of a grammar into aprogramming language:

▶ Parsers for larger languages are composed from parsers forsmaller languages

▶ The grammar can be written & debugged bottom-up▶ The parsers are first-class objects:

▶ We get to use abstraction to create complex parsers quickly &simply

▶ Re-use effectively common sub-languages▶ Simple to understand & implement▶ Very rapid development

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 115 / 175

Page 116: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)Parsing combinators do have some disadvantages:

▶ The grammar is embedded as-is:▶ As much backtracking as implied by the grammar: Rewrite

rules that have large common prefixes are going to requireplenty of backtracking:

A → xByCzDtA → xByCzDw

· · ·

▶ No optimizations or transformations are performed on it!▶ ε-productions & left-recursion result in infinite loops

▶ We need to eliminate these manually!▶ Can produce inefficient parsers rather efficiently! 😉

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 116 / 175

Page 117: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

Nevertheless:▶ Parsing combinators are very simple to learn about grammars:

▶ No complex algorithms are necessary!▶ The easiest way to design complex grammars & their parsers:

Abstraction —▶ shortens & simplifies the code▶ encourages re-use & consistency

▶ Optimizations can always be done manually!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 117 / 175

Page 118: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

Our parsing combinators take lists of characters for input, and returnan AST. We start with code to convert strings to lists of characters:

let string_to_list str =let rec loop i limit =if i = limit then []else (String.get str i) :: (loop (i + 1) limit)

inloop 0 (String.length str);;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 118 / 175

Page 119: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

We shall also want to generate a string from a list of characters:

let list_to_string s =let rec loop s n =match s with| [] -> String.make n '?'| car :: cdr ->

let result = loop cdr (n + 1) inBytes.set result n car;result

inloop s 0;;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 119 / 175

Page 120: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

Sometimes our parsers must fail on their input. When this happens,we raise an exception (which in other languages is called throwing anexception).We should therefore define an exception:

exception X_no_match;;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 120 / 175

Page 121: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

Parsing combinators are compositional. This means▶ We build parsers of large languages by combining parsers for

smaller [sub-]languages▶ The procedures that combine parsers are called parsing

combinators (PCs)▶ But we must start by being able to parse single characters

▶ All other parsers are built on top of such simple parsers forsingle characters

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 121 / 175

Page 122: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

The const PC takes a predicate (char -> bool), and return aparser that recognizes this character:

let const pred =function| [] -> raise X_no_match| e :: s ->

if (pred e) then (e, s)else raise X_no_match;;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 122 / 175

Page 123: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

We define the non-terminal that recognizes the capital letter 'A' bycalling const with a predicate that returns true if its argument isequal to 'A':

# let ntA = const (fun ch -> ch = 'A');;val ntA : char list -> char * char list = <fun>

Notice that ntA▶ …takes a list of characters▶ …returns a pair of what it matched, and the remaining characters

This is the structure of all parsers written using PCs

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 123 / 175

Page 124: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

Using ntA# ntA ['A'; 'B'; 'C'];;- : char * char list = ('A', ['B'; 'C'])# ntA [];;Exception: PC.X_no_match.# ntA ['a'; 'A'];;Exception: PC.X_no_match.

▶ We only match the head of the input▶ Obviously, ntA fails on an empty list

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 124 / 175

Page 125: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

▶ Testing our parsers by applying them to lists is no fun▶ It’s a pain to type lists of characters!

▶ Let’s automate things a bit:let test_string nt str =let (e, s) = (nt (string_to_list str)) in(e, (Printf.sprintf "->[%s]" (list_to_string s)));;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 125 / 175

Page 126: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

We can now test more easily:

# test_string ntA "";;Exception: PC.X_no_match.# test_string ntA "Abc";;- : char * string = ('A', "->[bc]")

This is only for testing! When we deploy our parser, we’ll call itdirectly.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 126 / 175

Page 127: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

Constant parsers are not very useful! Let’s consider catenation:

let caten nt1 nt2 =fun s ->let (e1, s) = (nt1 s) inlet (e2, s) = (nt2 s) in((e1, e2), s);;

▶ We try to parse the head of s using nt1▶ If we succeed, we get e1 and the remaining chars s▶ We try to parse the head of s (what remained after nt1) using

nt2▶ If we succeed, we get e2 and the remaining chars s▶ We return the pair of e1 & e2, as well as the remaining chars

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 127 / 175

Page 128: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)We define and test the parser for A followed by B:# let ntAB =caten (const (fun ch -> ch = 'A'))

(const (fun ch -> ch = 'B'));;val ntAB : char list -> (char * char) * char list = <fun># test_string ntAB "ABC";;- : (char * char) * string = (('A', 'B'), "->[C]")# test_string ntAB "abc";;Exception: PC.X_no_match.# test_string ntAB "Abc";;Exception: PC.X_no_match.# test_string ntAB "AB";;- : (char * char) * string = (('A', 'B'), "->[]")# test_string ntAB "A Bcdef";;Exception: PC.X_no_match.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 128 / 175

Page 129: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

We now consider disjunction of two parsers:

let disj nt1 nt2 =fun s ->try (nt1 s)with X_no_match -> (nt2 s);;

▶ We try to parse the head of s using nt1▶ If we succeed, then the call to nt1 returns normally▶ If we fail we try to parse the head of s using nt2

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 129 / 175

Page 130: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)We define and test the parser for either A or a:

# let ntA_or_a =disj (const (fun ch -> ch = 'A'))

(const (fun ch -> ch = 'a'));;val ntA_or_a : char list -> char * char list = <fun># test_string ntA_or_a "";;Exception: PC.X_no_match.# test_string ntA_or_a "this won't work either";;Exception: PC.X_no_match.# test_string ntA_or_a "A nice example";;- : char * string = ('A', "->[ nice example]")# test_string ntA_or_a "a nice example";;- : char * string = ('a', "->[ nice example]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 130 / 175

Page 131: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

What next?▶ Some simple parsers▶ Learn about the algebra of PCs▶ Learn of new PC operators▶ Learn how to use abstraction to make our life simpler

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 131 / 175

Page 132: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Some simple parsers

let nt_epsilon s = ([], s);;let nt_none _ = raise X_no_match;;let nt_end_of_input = function| [] -> ([], [])| _ -> raise X_no_match;;

▶ nt_epsilon is the parser that recognizes ε-productions▶ nt_none is the parser that always fails▶ nt_end_of_input is the parser that recognizes the end of the

input stream (and fails otherwise)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 132 / 175

Page 133: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

What next?🗸 Some simple parsers▶ Learn about the algebra of PCs▶ Learn of new PC operators▶ Learn how to use abstraction to make our life simpler

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 133 / 175

Page 134: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Algebra of PCsWhy do nt_epsilon & nt_end_of_input match with the emptylist []?This has to do with the Algebra of parsing combinators:

▶ What is the unit element of catenation?▶ Answer: r = ε▶ We’re looking for a non-terminal r such that for any s, we have

rs = sr = s…▶ This means that nt_epsilon is the unit element for caten:

▶ caten nt_epsilon nt ≡ caten nt nt_epsilon ≡ nt▶ Both nt_epsilon & nt_end_of_input are used ’til the end of

something▶ The natural operation is to create a list of all things until ε or

the end-of-input are reached▶ The unit element for append on lists is the empty list▶ Ergo, it is natural to match [] when either condition is

encountered

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 134 / 175

Page 135: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

The Algebra of PCs (continued)

Similarly, nt_none is the unit element in the algebra of disjuction:disj nt nt_none ≡ disj nt_none nt ≡ nt

☞ Later on, we shall use the algebra of PCs together with foldingoperations to create complex parsers easily

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 135 / 175

Page 136: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

What next?🗸 Some simple parsers🗸 Learn about the algebra of PCs▶ Learn of new PC operators▶ Learn how to use abstraction to make our life simpler

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 136 / 175

Page 137: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC OperatorsIdentifying the characters, or pairs of characters, etc that match agrammar is often not enough:

▶ We want to be able to create an AST for that piece of syntax▶ We do this by specifying postprocessing or callback functions

over the expression that was matched.▶ In our package, the PC that performs this is called packlet pack nt f =fun s ->let (e, s) = (nt s) in((f e), s);;▶ pack takes a non-terminal nt and a function f

▶ returns a parser that recognizes the same language as nt▶ …but which applies f to whatever was matched

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 137 / 175

Page 138: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing combinators (continued)

Example: Identifying digits# let nt_digit_0_to_9 =const (fun ch -> '0' <= ch && ch <= '9');;val nt_digit_0_to_9 :

char list -> char * char list = <fun># test_string nt_digit_0_to_9 "234";;- : char * string = ('2', "->[34]")# let nt_digit_0_to_9 =pack (const (fun ch -> '0' <= ch && ch <= '9'))

(fun ch -> (int_of_char ch) - ascii_0);;val nt_digit_0_to_9 :

char list -> int * char list = <fun># test_string nt_digit_0_to_9 "234";;- : int * string = (2, "->[34]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 138 / 175

Page 139: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions

▶ Grammars are often recursive or mutually-recursive:▶ The non-terminal on the LHS of a production often appears on

the RHS (recursion)▶ The non-terminal on the LHS of a production often appears in

one of the RHSs of the transitive-reflexive closure of therelation (mutual recursion)

▶ Currently, we are unable to describe recursive rules using PCs

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 139 / 175

Page 140: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)

We are unable to describe recursive rules using PCs:

⟨A⟩ → ( (⟨A⟩∗|ε) )

▶ The non-terminal A▶ The open-parenthesis token▶ The close-parenthesis token

▶ Nevermind that we don’t yet have star…▶ We can’t use nt_A before it’s defined!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 140 / 175

Page 141: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)

We are unable to describe recursive rules using PCs:let nt_A =caten (const (fun ch -> ch = '('))

(caten (disj (star nt_A)nt_epsilon)

(const (fun ch -> ch = ')')));;

▶ Nevermind that we don’t yet have star…▶ We can’t use nt_A before it’s defined!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 141 / 175

Page 142: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)We are unable to describe recursive rules using PCs:

▶ The problem is not specific to parsing combinators.▶ For example, you couldn’t define in Scheme:

(define f (g (h f)))because you can’t use something before it’s defined! (Ok, insome languages you can!)

▶ So how are recursive definitions possible at all?▶ When you define a recursive function you are not using the

function before it’s defined▶ You are using the address of the function before the function is

defined▶ Recursive functions are circular data structures:

▶ The language definition permits you to define these particularcircular structures statically, rather than at run-time

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 142 / 175

Page 143: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing combinators (continued)

To implement recursive parsers, we need to delay the evaluation ofthe recursive non-terminal

▶ ”Wrap it in a lambda…”

let delayed thunk =fun s -> thunk() s;;

▶ A thunk is a procedure that takes zero arguments▶ Thunks are used to delay evaluation

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 143 / 175

Page 144: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)

Example: Identifying digits (continued)# let nt_natural =let rec make_nt_natural () =pack (caten nt_digit_0_to_9

(disj (delayed make_nt_natural)nt_epsilon))

(function (a, s) -> a :: s) inmake_nt_natural();;

val nt_natural : char list -> int list * char list = <fun>

▶ Notice the packing function (function (a, s) -> a :: s)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 144 / 175

Page 145: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)

Example: Identifying digits (continued)# test_string nt_natural "1234";;- : int list * string = ([1; 2; 3; 4], "->[]")

We are not done yet:▶ We got a list of digits, as opposed to a list of chars!☞ We want to left-fold these digits into a number in base 10

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 145 / 175

Page 146: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers combinators (continued)We pack the list of digits using a left-fold:# let nt_natural =let rec make_nt_natural () =pack (caten nt_digit_0_to_9

(disj (delayed make_nt_natural)nt_epsilon))

(function (a, s) -> a :: s) inpack (make_nt_natural())

(fun s ->(List.fold_left

(fun a b -> 10 * a + b)0s));;

val nt_natural : char list -> int * char list = <fun>▶ Notice the type of the parser: char list -> int * char list

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 146 / 175

Page 147: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)

Testing it:

# test_string nt_natural "1234";;- : int * string = (1234, "->[]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 147 / 175

Page 148: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)

The parser ntParen expresses the grammar of one set ofarbitrarily-nested parentheses:

# let rec ntParen s =pack (caten (const (fun ch -> ch = '('))

(caten (disj (delayed (fun _ -> ntParen))(pack nt_epsilon

(fun _ -> "ntParen")))(const (fun ch -> ch = ')'))))

(fun _ -> "ntParen") s ;;val ntParen :char list -> string * char list = <fun>

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 148 / 175

Page 149: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Recursive productions (continued)

Testing ntParen on various inputs:

# test_string ntParen "()";;- : string * string = ("ntParen", "->[]")# test_string ntParen "";;Exception: PC.X_no_match.# test_string ntParen "((()))";;- : string * string = ("ntParen", "->[]")# test_string ntParen "((())())";;Exception: PC.X_no_match.# test_string ntParen "((()))ABC";;- : string * string = ("ntParen", "->[ABC]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 149 / 175

Page 150: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsing combinators (continued)

▶ By now, our toolset of parsing combinators consists of▶ const▶ caten▶ disj▶ pack▶ delayed

▶ We can handle recursive grammars▶ We can create ASTs▶ In principle, we can implement parsers for any language☞ We now wish to add additional PCs to simplify the task of

writing parsers

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 150 / 175

Page 151: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

The Kleene StarThe Kleene-star is ameta-production-rule, or arule-schema, or a ”macro” overproduction-rules.

▶ For any NT P, P∗ stands for therule Pstar defined as follows:

Pstar → P Pstar | ε

▶ The point of the Kleene-star isto recognize the catenation ofzero or more expressions in P.

Stephen ColeKleene

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 151 / 175

Page 152: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

Here is our support for the Kleene-star:

let rec star nt =fun s ->try let (e, s) = (nt s) in

let (es, s) = (star nt s) in(e :: es, s)

with X_no_match -> ([], s);;

Notice how we match ε implicitly.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 152 / 175

Page 153: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

The Kleene-plus▶ For any NT P, P+ stands for the rule Pplus defined as follows:

Pplus → P Pplus | P

▶ The point of the Kleene-plus is to recognize the catenation ofone or more expressions in P.

▶ Kleene didn’t really invent the Kleene-plus▶ Rather, Kleene-plus is a natural extension of Kleene-star

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 153 / 175

Page 154: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

Here is our support for the Kleene-plus:

let plus nt =pack (caten nt (star nt))

(fun (e, es) -> (e :: es));;

Notice how we define the Kleene-plus as the catenation ofKleene-star and the original NT.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 154 / 175

Page 155: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)Let’s test star and plus:# let star_star = star (const (fun ch -> ch = '*'));;val star_star : char list -> char list * char list = <fun># let star_plus = plus (const (fun ch -> ch = '*'));;val star_plus : char list -> char list * char list = <fun># test_string star_star "****the end!";;- : char list * string =

(['*'; '*'; '*'; '*'], "->[the end!]")# test_string star_plus "****the end!";;- : char list * string =(['*'; '*'; '*'; '*'], "->[the end!]")# test_string star_star "the end!";;- : char list * string = ([], "->[the end!]")# test_string star_plus "the end!";;Exception: PC.X_no_match.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 155 / 175

Page 156: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

Ocaml provides the polymorphic typeα option = none | Some of α as a way of dealing with situationswhere a value may or may not exist.We’re going to use α option to implement maybe, which takes aparser r, and returns a parser r? that recognizes zero or oneoccurrences of whatever is recognized by r.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 156 / 175

Page 157: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

let maybe nt =fun s ->try let (e, s) = (nt s) in

(Some(e), s)with X_no_match -> (None, s);;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 157 / 175

Page 158: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

Assume you have the parser nt_integer, that recognizes integers.Here is how we might use maybe:

# test_string nt_integer "1234";;- : int * string = (1234, "->[]")# test_string (maybe nt_integer) "1234";;- : int option * string = (Some 1234, "->[]")# test_string (maybe nt_integer) "moshe";;- : int option * string = (None, "->[moshe]")

You would use pattern matching (via match) to handle both cases(None/Some)

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 158 / 175

Page 159: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

We might want to attach an arbitrary predicate to serve as a guardfor a parser, so that the parser succeeds only if the matched objectsatisfies the guard. This is what the guard PC does:

let guard nt pred =fun s ->let ((e, s) as result) = (nt s) inif (pred e) then resultelse raise X_no_match;;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 159 / 175

Page 160: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

New PC Operators (continued)

Let’s use guard to identify only even numbers:

# test_string (guard nt_integer (fun n -> n land 1 = 0))"12345";;

Exception: PC.X_no_match.# test_string (guard nt_integer (fun n -> n land 1 = 0))

"123456";;- : int * string = (123456, "->[]")

This exceeds the expressive power of CFGs!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 160 / 175

Page 161: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

What next?🗸 Some simple parsers🗸 Learn about the algebra of PCs🗸 Learn of new PC operators▶ Learn how to use abstraction to make our life simpler

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 161 / 175

Page 162: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs

We now wish to demonstrate some examples of using functionalabstraction to write parsers in a general, consistent, and convenientway.Up to now we used to define single-character parsers using const:

let nt_A = const (fun ch -> ch = 'A');;

This is kind of clumsy. Let’s see how we can do this better!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 162 / 175

Page 163: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

let make_char equal ch1 = const (fun ch2 -> equal ch1 ch2);;let char = make_char (fun ch1 ch2 -> ch1 = ch2);;let char_ci =make_char (fun ch1 ch2 ->

(Char.lowercase_ascii ch1) =(Char.lowercase_ascii ch2));;

The use of make_char allows us to define parser-generatingfunctions for characters, in a case-sensitive or case-insensitive way.

☞ Warning: The version of ocaml installed in the labs usesChar.lowercase, which is now deprecated. It’ll be upgraded[next year].

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 163 / 175

Page 164: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

# test_string (char 'a') "abc";;- : char * string = ('a', "->[bc]")# test_string (char 'a') "ABC";;Exception: PC.X_no_match.# test_string (char_ci 'a') "abc";;- : char * string = ('a', "->[bc]")# test_string (char_ci 'a') "ABC";;- : char * string = ('A', "->[BC]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 164 / 175

Page 165: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

If we wish to recognize entire words, this is still very cumbersome.We can put to a good use the algebra of catenation to do better: Toidentify a word, we —

▶ Take a string of chars, and convert it to a list▶ Map over each character in the list, creating a parser that

recognizes that character▶ Perofrm a right fold over that list using the caten operation

(with an approriate pack)▶ The unit element is the unit element of catenation, namely

epsilonBy abstracing over char we can get both case-sensitive andcase-insensitive variants!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 165 / 175

Page 166: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

Here is the code:

let make_word char str =List.fold_right(fun nt1 nt2 -> pack (caten nt1 nt2)

(fun (a, b) -> a :: b))(List.map char (string_to_list str))nt_epsilon;;

let word = make_word char;;let word_ci = make_word char_ci;;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 166 / 175

Page 167: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

# test_string (word "moshe")"moshe is a nice guy!";;

- : char list * string =(['m'; 'o'; 's'; 'h'; 'e'], "->[ is a nice guy!]")

# test_string (word_ci "moshe")"Moshe is a nice guy!";;

- : char list * string =(['M'; 'o'; 's'; 'h'; 'e'], "->[ is a nice guy!]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 167 / 175

Page 168: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

We might want to pick any single character in a string. Rather thanspecifying long disjunctions, we can use one_of to do this for us.

▶ Very similar to word:▶ We use disj rather than caten▶ The unit element for disj is nt_none

Such is the power of abstraction!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 168 / 175

Page 169: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

let make_one_of char str =List.fold_rightdisj(List.map char (string_to_list str))nt_none;;

let one_of = make_one_of char;;let one_of_ci = make_one_of char_ci;;

As usual, we generate both the case-sensitive and case-insensitiveversions!

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 169 / 175

Page 170: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

Let’s try out one_of:

# test_string (one_of "abcdef") "moshe!";;Exception: PC.X_no_match.# test_string (one_of "abcdef") "be moshe!";;- : char * string = ('b', "->[e moshe!]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 170 / 175

Page 171: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

When we wanted to recognize a range of characters, we, once again,used the const PC. We can do better using abstraction:

let make_range leq ch1 ch2 (s : char list) =const (fun ch -> (leq ch1 ch) && (leq ch ch2)) s;;

let range = make_range (fun ch1 ch2 -> ch1 <= ch2);;let range_ci =make_range (fun ch1 ch2 ->

(Char.lowercase_ascii ch1) <=(Char.lowercase_ascii ch2));;

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 171 / 175

Page 172: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)

And here is how we can test range:

# test_string (star (range 'a' 'z')) "hello world!";;- : char list * string =

(['h'; 'e'; 'l'; 'l'; 'o'], "->[ world!]")# test_string (star (range 'a' 'z')) "HELLO WORLD!";;- : char list * string =

([], "->[HELLO WORLD!]")# test_string (star (range_ci 'a' 'z')) "Hello World!";;- : char list * string =

(['H'; 'e'; 'l'; 'l'; 'o'], "->[ World!]")

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 172 / 175

Page 173: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Functional abstraction in PCs (continued)How might you debug parsers written using PCs?

▶ The PC trace_pc is a wrapper (using the decorator pattern)that can be used to trace any parser

▶ The trace_pc PC takes a documentation string and a parser,and returns a tracing parser.

# test_string (trace_pc "The word \"hi\"" (word "hi"))"high";;

;;; The word "hi" matched the head of "high",and the remaining string is "gh"

- : char list * string = (['h'; 'i'], "->[gh]")# test_string (trace_pc "The word \"hi\"" (word "hi"))

"bye";;;;; The word "hi" failed on "bye"Exception: PC.X_no_match.

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 173 / 175

Page 174: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Parsers Combinators (continued)

What next?🗸 Some simple parsers🗸 Learn about the algebra of PCs🗸 Learn of new PC operators🗸 Learn how to use abstraction to make our life simpler

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 174 / 175

Page 175: Compiler Construction - BGUcomp191/wiki.files/compiler... · 2018-10-31 · Refresher Last week, we discussed The interpreter as an evaluation function The compiler as a translator

Further reading

🔗 Parsing Combinators

Mayer Goldberg \ Ben-Gurion University Compiler Construction October 31, 2018 175 / 175