63
CSS CodeSurfer Erlang Unlambda Assembly Pascal Delphi SVG LaTeX DITA sh PCRE Promela EMF Ecore ATL Ada Turbo Vision Haskell Rascal Java C# Ruby Python make git C++ Scheme PHP ANTLR Jenkins XSLT C ksh DHTML JS Perl CGI Matlab Maple Prolog XML XSD DTD LCF LDF XLDF BGF XBGF EDD BNF EBNF FST ΞBGF ASF SDF GDK GRK Smalltalk COBOL GWBasic QBasic VB Flash GIF Inkscape Eclipse GrammarLab LCI Grammar Hunter DMS MetaEnvironment HTML XHTML Django Zope Wordpress MediaWiki Wikidot Wikia Markdown bibTeX IDA WinIce M3 JSON 80x86 SQL SPARQL XCK FPU yED GraphML SPIN SoftIce DeGlucker TSR Graphviz dot Subversion CVS Grammar Hawk PDG DCG jQuery phpbb CRC Blowfish HASP OS/400 JCL JAXB Parsing with Grammars Dr. Vadim Zaytsev, Universiteit van Amsterdam UvA MSc SE: Software Construction 2015

Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

CSS

CodeSurferErlang

Unlambda

AssemblyPascal

Delphi

SVGLaTeX

DITAsh

PCRE

Promela EMF

Ecore

ATL

Ada

Turbo Vision

Haskell

Rascal

Java C#Ruby

Python

make

gitC++

Scheme

PHP

ANTLR

JenkinsXSLT

C

kshDHTML

JS

Perl

CGI

Matlab

Maple

Prolog

XML

XSD

DTD

LCF

LDF

XLDF

BGF

XBGFEDD

BNF

EBNF

FSTΞBGF

ASFSDF

GDK

GRK

Smalltalk

COBOL

GWBasic

QBasic

VB

FlashGIF

Inkscape

Eclipse

GrammarLab

LCI

Grammar Hunter

DMS

MetaEnvironment

HTML

XHTML

Django

Zope

Wordpress

MediaWiki

WikidotWikia

Markdown

bibTeX

IDA

WinIce

M3

JSON80x86

SQL

SPARQL

XCK

FPU

yEDGraphML

SPINSoftIce DeGlucker

TSR

Graphviz

dot

Subversion

CVS

Grammar Hawk

PDGDCGjQuery

phpbb

CRCBlowfish

HASP

OS/400

JCL

JAXB

Parsing with

Grammars

Dr. Vadim Zaytsev, Universiteit van Amsterdam

UvA MSc SE: Software Construction 2015

Page 2: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Grammars & parsingare among the most

established areas of CS/SE

Page 3: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

N. Chomsky, Syntactic Structures,

1957

Page 4: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

N. Chomsky, Aspects of the

Theory of Syntax, 1965

Page 5: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

A.V. Aho & J.D. Ullman,

The Theory of Parsing,

Translation and Compiling,

Volumes I + II, 1972

Page 6: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

A.V. Aho,R. Sethi,

J.D. Ullman, Compilers: Principles,

Techniques and Tools, 1986

Page 7: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

D. Grune,C.J.H. Jacobs,

Parsing Techniques: A Practical Guide, 2 ed,

2008

Page 8: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Why are grammars and parsing relevant?

Page 9: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Language

• Programming languages: C, Java, C#, JavaScript

• Markup languages: HTML, XML, TeX, Markdown, wikis

• Domain-specific languages: BibTeX, CSS, SQL, QL

• Data formats: JSON, log files, protocol data, bytecode

• …

• (formally: a set of strings)

Page 10: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

How to define a language?

• List all the sentences!

• Infinite languages?

• Finite recipes = grammars

• Infinite grammars?

• Two level grammars

Page 11: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Example• Valid sentences/programs/instances:

• Alice

• Alice and Bob

• Alice, Bob and Coen

• Alice, Bob, Coen and Daenerys

• …

• How to define a recipe?

Page 12: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

ABCD GrammarName → Alice

Name → Bob

Name → Coen

Name → Daenerys

Sentence → List End

List → Name

List → List , Name

, Name End → and Name

Page 13: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

ABCD GrammarName → Alice

Name → Bob

Name → Coen

Name → Daenerys

Sentence → List End

List → Name

List → List , Name

, Name End → and Name

Terminal symbols

Page 14: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Name → Alice

Name → Bob

Name → Coen

Name → Daenerys

Sentence → List End

List → Name

List → List , Name

, Name End → and Name

Terminal symbols

ABCD Grammar

Nonterminal symbols

Page 15: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

ABCD GrammarName → Alice

Name → Bob

Name → Coen

Name → Daenerys

Sentence → List End

List → Name

List → List , Name

, Name End → and Name

Terminal symbols

Starting symbol

Nonterminal symbols

Page 16: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Starting symbol

ABCD GrammarName → Alice

Name → Bob

Name → Coen

Name → Daenerys

Sentence → List End

List → Name

List → List , Name

, Name End → and Name

Terminal symbols

Production rules

Nonterminal symbols

Page 17: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

• Alice and Bob

• Alice and Bob → Name and Bob → Name and Name → Name , Name End →List , Name End → List End → Sentence

• (analytic semantics)

• Alice and Bob

• Sentence → List End → List , Name End →Name , Name End → Name and Name → Alice and Name → Alice and Bob

• (generative semantics)

Using ABCD

Production

Page 18: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Notations

• Name → Alice | Bob | Coen | Daenerys

• Name → "Alice" | "Bob" | "Coen" | "Daenerys"

• Name → “Alice” | “Bob” | “Coen” | “Daenerys”

• ⟨Name⟩ → Alice | Bob | Coen | Daenerys

• ⟨Name⟩ → “Alice” | “Bob” | “Coen” | “Daenerys”

Page 19: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Notations• List → List , Name

• ⟨List⟩ ::= ⟨List⟩ "," ⟨Name⟩ ;

• List "," Name -> List

• List <- List ',' Name

• define List [List] , [Name] end define

• syntax List = List "," Name;

• List -> List ',' Name : ['$1'|'$3'].

Page 20: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Common metaconstructs

• Optional symbols

• A?, [A]

• Zero or more (Kleene star)

• A*, {A}

• One or more

• A+

• Choice (disjunction)

• A | B, A / B

• Less common (careful!)

• conjunction • negation • exact repetition • reference naming • priorities

Page 21: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Chomsky-Schützenberger hierarchy

• Type-0: Recursively enumerable

• Rules: α → β (unrestricted)

• Type-1: Context-sensitive

• Rules: αAβ → αγβ

• Type-2: Context-free

• Rules: A → γ

• Type-3: Regular

• Rules: A → a and A → aB

Noam Chomsky. On Certain Formal Properties of Grammars, Information & Control 2(2):137–167, 1959.

Page 22: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

CFG for ABCD

⟨Name⟩ → “Alice” | “Bob” | “Coen” | “Daenerys”

⟨Sentence⟩ → ⟨Name⟩ | ⟨List⟩ “and” ⟨Name⟩

⟨List⟩ → ⟨Name⟩ “,” ⟨List⟩ | ⟨Name⟩

⟨List⟩ → ⟨Name⟩ (“,” ⟨Name⟩)*

⟨List⟩ → {⟨Name⟩ “,”}+

Page 23: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Regexp for ABCD

^\w+((, \w+)* and \w+)?$

S. C. Kleene, Representation of Events in Nerve Nets and Finite Automata. In Automata Studies, pp. 3–42, 1956. photo from: Konrad Jacobs, S. C. Kleene, 1978, MFO.

Page 24: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

58 2 Grammars as a Generating Device

Fig. 2.33. The silhouette of a rose, approximated by Type 3 to Type 0 grammars

Rose by Arwen Grune; p.58 of Grune/Jacobs’ “Parsing Techniques”, 2008

Page 25: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Finite world

• Explicitly given lists

• Acyclic automata

• Finite choice grammars (non-recursive, non-iterating)

• i.e., users, keywords, postcodes

Page 26: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Regular world

• Regular expressions

• Finite automata

• Grammars: • A → a • A → aB

• i.e., substring search, substring replace, counting

Page 27: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)

“somewhat pushes the limits of what it is sensible

to do with regular expressions”

Jeff Atwood, Regex use vs. Regex abuse, 16 Feb 2005. RFC822. Paul Warren, Mail::RFC822::Address: regexp-based address validation, 17/09/2012.

Page 28: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Context-free world

• Grammarware and software languages

• Nondeterministic pushdown automata

• Grammars: • A → γ

• i.e., parsing, pretty-printing, etc

• A → BC • A → a • A → ε

Jochgem, https://commons.wikimedia.org/wiki/File:Pushdown-overview.svg, CC-BY-SA, 2008.

Page 29: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Context-sensitive world

• Computer

• Linear-bounded automata

• Grammars: • αAβ → αγβ

• i.e., anything practical

http://www.legoturingmachine.org

Page 30: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Unrestricted world

• Imaginary machines

• Turing machine, λ-calculus, semi-Thue rewriting systems, Lindenmayer systems, Markov algorithm, …

• Grammars: • α → β

• i.e., almost anything

parsing is impossiblerecognising is impossible

Page 31: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

In practice…

• Regular grammars are used for lexical analysis • keywords • constants • comments (if not nested)

• Context-free grammars: for structured/nested constructs • class declaration • if statement • …

• Everything else: annotations + hacking

Page 32: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

A sentence

position := initial + rate * 60

Page 33: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Lexical tokens

Page 34: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Parse tree

Page 35: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

PEG

• Parsing Expression Grammars, introduced in 2004

• Analytic grammars; top-down unambiguous recognition

• Explicit backtracking, ordered disjunction

• Linear parsing time (with memoisation)

• Do not fit into the Chomsky-Schützenberger hierarchy

• Conjunction and negation are purely lookahead-based

Page 36: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

PEG example

• S ← &X 'a'+ Y

• X ← Z 'c'

• Z ← 'a' Z? 'b'

• Y ← 'b' Y? 'c'

{aⁿbⁿcⁿ | n>0}

fixed from https://en.wikipedia.org/wiki/Parsing_expression_grammar

Page 37: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

What is a parser? (recogniser, compiler, …)

Page 38: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Recogniser

Page 39: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Parser

Page 40: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Interpreter

Page 41: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Compiler

Page 42: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Parsing

Page 43: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Bottom-up parsing

• Reduce the input back to the start symbol

• Recognise terminals

• Replace terminals by nonterminals

• Replace terminals and nonterminals by left-hand side of rule

• LR, LR(0), LR(1), LR(k), LALR, SLR, GLR, SGLR, CYK, …

Page 44: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Top-down parsing

• Imitate the production process by rederivation

• Each nonterminal is a goal

• Replace each goal by subgoals (= elements of its rule)

• Parse tree is built from top to bottom

• LL, LL(1), LL(k), LL(*), GLL, DCG, rec. descent, Packrat, Earley

Page 45: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

How to parse with a grammar?• Write a parser manually

• good error handling

• possible fine-tuning

• a lot of work (seriously, years)

• Generate with a parser generator

• less work (maybe)

• complex, rigid, idiosyncratic frameworks

• difficult error handling

Page 46: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Manually: recursive descent

• Grammar:

• A ::= x B C;

• Parser:

• A() { match(‘x’); B(); C(); }

Page 47: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Manually: recursive descent

• Grammar:

• A ::= x B C | y D | ε ;

• Parser:

• A() { if (lookahead==‘x’) { match(‘x’); B(); C(); }else if (lookahead==‘y’) { match(‘y’); D(); }else ; }

Page 48: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Manually: recursive descent

• Grammar:

• A ::= x B C | x D ;

• Parser:

• A() { if (lookahead==‘x’) { match(‘x’); B(); C(); }else { match(‘x’); D(); } }

solved by backtracking: try and if fail, return

Page 49: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Left factoring

• For some parsers, this is inefficient:

• S -> if E then S else S | if E then S

• Can be rewritten as this:

• S -> if E then S P

• P -> else S | ε

sometimes the only way

Page 50: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Manually: recursive descent

• Grammar:

• Expr ::= Expr ‘+’ Expr ;

• Parser:

• Expr() { Expr(); match(‘+’); Expr(); }

solved by ‘optimising’ the grammar

Page 51: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Left recursion removal

• Expr -> Expr + Term

• Expr -> Expr – Term

• Expr -> Term

• Term -> [0-9]

• Expr -> Term Rest

• Rest -> + Term Rest | - Term Rest | ε

• Term -> [0-9]

Page 52: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Ambiguity: one sentence, several possible trees

sometimes possible to annotate the grammar with priorities

Page 53: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

64 3 Introduction to Parsing

2

1

2

1 1

3c 3e 3a

3 + 5 + 1

9

3

6

5 1

2

2

11 1

3c 3e 3a

3 + 5 + 1

9

8

13 5

Fig. 3.2. Spurious ambiguity: no change in semantics

2

1

2

1 1

3c 3e 3a

3 - 5 - 1

−1

3

4

5 1

2

2

11 1

3c 3e 3a

3 - 5 - 1

−3

−2

13 5

Fig. 3.3. Essential ambiguity: the semantics differ

Strangely enough, languages can also be ambiguous: there are (context-free) lan-guages for which there is no unambiguous grammar. Such languages are inherentlyambiguous. An example is the language L = ambncn ∪ apbpcq. Sentences in L

consist either of a number of as followed by a nested sequence of bs and cs, or of anested sequence of as and bs followed by a number of cs. Example sentences are:abcc, aabbc, and aabbcc; abbc is an example of a non-sentence. L is producedby the grammar of Figure 3.4.

Ss ---> AB | DCA ---> a | aAB ---> bc | bBcD ---> ab | aDbC ---> c | cC

Fig. 3.4. Grammar for an inherently ambiguous language

Intuitively, it is reasonably clear why L is inherently ambiguous: any part of thegrammar that produces ambncn cannot avoid producing anbncn, and any part of thegrammar that produces apbpcq cannot avoid producing apbpcp. So whatever we do,forms with equal numbers of as, bs, and cs will always be produced twice. Formallyproving that there is really no way to get around this is beyond the scope of this book.

p.64 of Grune/Jacobs’ “Parsing Techniques”, 2008

Page 54: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Disambiguation

• Expr -> Expr + Atom

• Expr -> Expr – Atom

• Expr -> Atom

• Atom -> Number

• Atom -> Variable

• Expr -> Expr + Term

• Expr -> Term

• Term -> Term * Primary

• Term -> Primary

• Primary -> Number

• Primary -> Variable

Page 55: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2
Page 56: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

ALL “optimisations” are parsing tech-specific!

(and make grammars uglier)

Page 57: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Recall: parser

Page 58: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Parser generator

Page 59: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Parser generators: ↑• LALR(1)

• Beaver

• YACC, byacc, bison, etc

• Eli

• Irony

• SableCC

• yecc

• GLR

• bison

• DMS

• GDK

• Tom

• SGLR

• ASF+SDF MetaEnv • Spoofax, Stratego/XT

Page 60: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Parser generators: ↓• LL(k)

• JavaCC

• LL(*)

• ANTLR

• Earley

• Marpa • ModelCC

• GLL

• Rascal — SGTDBF • gll-combinators in Scala

• Packrat

• Rats! • OMeta • PetitParser

• Others

• TXL

Page 61: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Summary of parsing techniques• Top-down

• predict-match and variants/improvements

• backtracking is good; memoisation is good

• left recursion is generally problematic

• Bottom-up

• shift-reduce and variants

• deterministic CFLs in linear time

Page 62: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Conclusion

• “Making a linear-time parser for an arbitrary given grammar is 10% hard work; the other 90% can be done by computer”. [Grune/Jacobs, Parsing Techniques, 2008, p.81]

• Every notation/metatool defines a class of Gs/Ls

• Parsing should never be more complex than O(n³)

• Many books/papers exist; beware of bullshit!

Page 63: Parsing with Grammars - Vadim Zaytsevgrammarware.net/slides/2014/parsing-with-grammars.pdf · 2015-06-03 · 64 3 Introduction to Parsing 2 1 2 1 1 3c 3e 3a 3 + 5 + 1 9 3 6 51 2 2

Credits• Given on the bottom of each slide

• unless self-made or public domain

• Font

• Intuitive by Bruno de Souza Leão, OFLhttp://openfontlibrary.org/en/font/intuitive

• Feedback

• http://grammarware.net, http://grammarware.github.io, …