28
© Neil Mitchell 2004 1 A New Parser Neil Mitchell

A New Parser - Haskell Community Server

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

© Neil Mitchell 2004 1

A New Parser

Neil Mitchell

© Neil Mitchell 2004 2

Disclaimer

This is not something I have done as part of my PhDI have done it on my ownI haven’t researched other systemsIts not finishedAny claims may turn out to be wrong

© Neil Mitchell 2004 3

Overview

What is parsing?What systems exist?What do you want to parse?What is my system?Why is mine better (or not)?How do you interact with a parser?

© Neil Mitchell 2004 4

Parsing is a function

From Text to Abstract Syntax TreeParse :: String -> Tree(Token)How?

Hand codedUsing Lex/Yacc (or Flex/Bison)Parser combinatiorsEtc…

© Neil Mitchell 2004 5

What Systems Exist?

Lex/Yacc – the classicCreated to write parsers for CSteps

Write a grammar file, including C codeGenerate a C fileCompile C fileLink to your code

© Neil Mitchell 2004 6

Lex

Lex :: String -> List(Token)

Uses Regular Expressions to split up a string into various lexemes.

Runs in O(n), using Finite State Automata.

© Neil Mitchell 2004 7

Yacc

Yacc :: List(Token) -> Tree(Token)

Based on a BNF grammar.Runs in just over O(n), using an LALR(1)

stack automaton.Often fails unpredictably…

© Neil Mitchell 2004 8

Early

Early :: List(Token) -> Tree(Token)

Almost identical to Yacc, but removes the unpredictable failures, requiring less knowledge of LALR(1)

A fair bit slower, worst case of O(n3) or O(n2.6) depending on implementation.

© Neil Mitchell 2004 9

Parser

ParserC = Yacc . LexParserHaskell = Happy . AlexParserJava = …

Very language dependant, Yacc/Lex both tied to C

© Neil Mitchell 2004 10

Bad points

Language dependantYacc – shift/reduce conflictNot CFGNot very intuitive to write Yacc

Summary: Lex good, Yacc bad

© Neil Mitchell 2004 11

What do you want to parse?

Languages: Haskell, C#, JavaConfigurations: INI, XMLGrammar files for this toolNOT: Perl, Latex, HTML, C++

Insane syntaxHorrid historyTwisted parody of languages

© Neil Mitchell 2004 12

Brackets, Strings, Escapes

Brackets () [] {} <> - YaccStrings “” ‘’ – LexAre strings not brackets, just which disallow nesting?What about escape characters?

Parse them in Lex: “((\.)|.)*”Re-parse them later

© Neil Mitchell 2004 13

My System

Bracket :: String -> Tree(Token)Lex :: String -> List(Token)Group :: Tree(Token) -> Tree(Token)

Parser :: String -> Tree(Token)Parser = Group . map Lex . Bracket

© Neil Mitchell 2004 14

Bracket

Match brackets, strings, escape charsDefine nesting

main = 0 * all [lexer]all : round stringround = "(" ")" allstring = "\"" "\"" escape [raw]escape = "\\" 1

© Neil Mitchell 2004 15

Lex

Same as traditional Lex, but…Easier – no need to do string escapingCan be different for different parts

In comments use [none]In strings use [raw]Can have many lexers for different parts

© Neil Mitchell 2004 16

Lex (2)

keyword = `[a-zA-Z][a-zA-Z0-9_]*`number = `[0-9]+`white = `[ \t]`star = "*"eq = "="for.while.

© Neil Mitchell 2004 17

Group

Group :: Tree(Token) -> Tree(Token)Id :: a -> a

Therefore "Group = Id" works

Sometimes you need a higher level of structure, what the brackets meanThe most complex element (unfortunately)

© Neil Mitchell 2004 18

Group (2)

root = main[*{rule literal}]

rule = line[ keyword eq {regexp string} ]

literal = line[ keyword dot ]

© Neil Mitchell 2004 19

Summary of BLG

Complete lack of embedded C/HaskellData format defined generically

Can be Haskell linked listCan be C arrayThere is an XML format defined

Similar in style to each otherAll "simple" langauges

© Neil Mitchell 2004 20

Implementation

BracketDeterministic Push down stack automata

LexSteal existing lex, FSA

GroupFSA? Maybe…Have a sketched automaton

© Neil Mitchell 2004 21

Implementation (2)

I have implemented most of it in C#Slow, but very useableBracket seems pretty perfectLex uses Regex objects, but worksGroup is less complete, uses backtracking, doesn't have maximal munch semantics, NP, etc.

© Neil Mitchell 2004 22

Implementation (3)

BLG is self-parsing ☺1 Lex file for all 31 Bracket file for all 33 Group files, one each

Reuse is good

© Neil Mitchell 2004 23

Interaction

How do you interact with a parser?Yacc/Lex

Translate, Compile, Link, Execute

BLGTranslate, Compile, Link, ExecuteCompile into resource fileLoad at runtime (Text Editors)

© Neil Mitchell 2004 24

Exclamations!

BLG defines a complete set of exclamations which allow for code hoisting and deletingRemove tokens from the output (white space/comments)Promote tokens, i.e. line![ x ] returns xSimple, but ignored here

© Neil Mitchell 2004 25

$Directives

In the Bracket, before any processingStream processing directives$text (remove '\r', append '\n')$tab-indent (for Haskell/Python)$upper-caseEasy, simple, generic, reusable

© Neil Mitchell 2004 26

Advantages

Language neutralHaskell parsing

GHC in HaskellHugs in CCould now use the same grammar

Can reuse elements, i.e. Lex and Bracket are almost identical for C#/Java

© Neil Mitchell 2004 27

But best of all

The grammars are really easy to specifyA bit of a leapWould need years of hypothesis testingAnd maybe even a working implementation

FasterAlmost irrelevant, thanks to faster computers

© Neil Mitchell 2004 28

Questions?

What did I explain badly?I would really appreciate any feedback!Should I ditch the entire idea?Should I implement it?Should I give up my PhD to sell this

system?