Upload
swanand-pagnis
View
80
Download
0
Embed Size (px)
Citation preview
In combinator parsing, the text of parsers resembles BNF notation. We present the basic method, and a number of extensions. We address the special problems presented by whitespace, and parsers with separate lexical and syntactic phases. In particular, a combining form for handling the “offside rule” is given. Other extensions to the basic method include an “into” combining form with many useful applications, and a simple means by which combinator parsers can produce more informative error messages.
• Combinators that resemble BNF notation
• Whitespace handling through "Offside Rule"
• "Into" combining form for advanced parsing
• Strategy for better error messages
Lexical analysis and syntax
• Combine the combinators
• Define lexical elements
• Return results and unused input
Simple when stick to fundamental FP
• Higher order functions
• Immutability
• Recursive problem solving
• Algebraic types
Types help with abstraction
• We'll be dealing with parsers and combinators
• Parsers are functions, they accept input and return results
• Combinators accept parsers and return parsers
A parser is a function that accepts an input and returns parsed results and the unused input for each result
Parser is a function type that accepts a list of type a and returns all possible results as a list of tuples of type (b, [a])
(Parser Char Number) input: "42 it is!" !-- a is a [Char] output: [(42, " it is!")] !-- b is a Number
satisfy !:: (a !-> Bool) !-> Parser a a satisfy p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []
satisfy !:: (a !-> Bool) !-> Parser a a satisfy p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []
Guard Clauses, if you want to Google
match_3_or_4 = match_3 `alt` match_4 match_3_or_4 "345" !-- !=> [('3',"45")] match_3_or_4 "456" !-- !=> [('4',"56")]
and_then !:: Parser a b !-> Parser a c !-> Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ]
and_then !:: Parser a b !-> Parser a c !-> Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ]
List comprehensions
(v11, out11) (v12, out12) (v13, out13)
…
(v21, out21) (v22, out22)
…
(v31, out31) (v32, out32)
…
(v31, out31)
p1
p2
using !:: Parser a b !-> (b !-> c) !-> Parser a c (p `using` f) inp = [(f v, out) | (v, out) !<- p inp ]
many !:: Parser a b !-> Parser a [b] many p = ((p `and_then` many p) `using` cons) `alt` (succeed [])
positive_integer = some (satisfy Data.Char.isDigit)
negative_integer = ((literal '-') `and_then` positive_integer) `using` cons
positive_decimal = (positive_integer `and_then` (((literal '.') `and_then` positive_integer) `using` cons)) `using` join
negative_decimal = ((literal '-') `and_then` positive_decimal) `using` cons
number !:: Parser Char [Char] number = negative_decimal `alt` positive_decimal `alt` negative_integer `alt` positive_integer
string !:: (Eq a) !=> [a] !-> Parser a [a] string [] = succeed [] string (x:xs) = (literal x `and_then` string xs) `using` cons
succeed, failure, satisfy, literal, alt, and_then, using, string, many, some, string, word, number, xthen, thenx, ret
Improving a little:
expn !::= term + term | term − term | term term !::= factor ∗ factor | factor / factor | factor factor !::= digit+ | (expn)
parenthesised_expression = ((nibble (literal '(')) `xthen` ((nibble expn) `thenx`(nibble (literal ')'))))
value xs = Const (numval xs) plus (x,y) = x `Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y
expn "12*(5+(7-2))" # !=> [ (Const 12.0 `Mul` (Const 5.0 `Add` (Const 7.0 `Sub` Const 2.0)),""), … ]
value xs = Const (numval xs) plus (x,y) = x `Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y
The parser (nibble p) has the same behaviour as parser p, except that it eats up any white-space in the input string before or afterwards
When obeying the offside rule, every token must lie either directly below, or to the right of its first token
satisfy !:: (a !-> Bool) !-> Parser a a satisfy p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []
satisfy !:: (a !-> Bool) !-> Parser (Pos a) a satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x
satisfy !:: (a !-> Bool) !-> Parser (Pos a) a satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x
offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c
offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]
offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]
offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp
offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp
offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c
(offside expn) (prelex inp_1) # !=> [(21.0,[('+',(2,0)),('(',(2,2)),('8',(2,3)),('*',(2,5)),('1',(2,7)),('0',(2,8)),(')',(2,9))])]
(offside expn) (prelex inp_2) # !=> [(101.0,[])]
∅ !|> succeed, fail !|> satisfy, literal !|> alt, and_then, using !|> many, some !|> string, thenx, xthen, return !|> expression parser & evaluator !|> any, nibble, symbol !|> prelex, offside
(p `tok` t) inp = [ ((<token>,<pos>),<unused input>) | (xs, out) !<- p inp] where (x, (r,c)) = head inp
lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
head (lexer (prelex "where x = 10")) # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])
(head.lexer.prelex) "where x = 10" # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])
(head.lexer.prelex) "where x = 10" # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])
Function composition
lexer = lex [ {- 1 -} ((some (any_of literal " \n\t")), Junk), {- 2 -} ((string "where"), Symbol), {- 3 -} (word, Ident), {- 4 -} (number, Number), {- 5 -} ((any_of string ["(",")","="]), Symbol)]
((!!= Junk).fst.fst) ((Symbol,"where"),(0,0)) # !=> True ((!!= Junk).fst.fst) ((Junk,"where"),(0,0)) # !=> False
(fst.head.lexer.prelex) "where x = 10" # !=> [((Symbol,"where"),(0,0)), ((Junk," "),(0,5)), ((Ident,"x"),(0,6)), ((Junk," "),(0,7)), ((Symbol,"="),(0,8)), ((Junk," "),(0,9)), ((Number,"10"),(0,10))]
(strip.fst.head.lexer.prelex) "where x = 10" # !=> [((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10))]
data Script = Script [Def] data Def = Def Var [Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]
prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
!-- only allow a kind of tag kind !:: Tag !-> Parser (Pos Token) [Char] kind t = (satisfy ((!== t).fst)) `using` snd
— only allow a given symbol lit !:: [Char] !-> Parser (Pos Token) [Char] lit xs = (literal (Symbol, xs)) `using` snd
prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
data Script = Script [Def] data Def = Def Var [Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]
Script [ Def "f" ["x","y"] ( ((Var "add" `Apply` Var "a") `Apply` Var "b") `Where` [ Def "a" [] (Num 25.0), Def "b" [] ((Var "sub" `Apply` Var "x") `Apply` Var "y")]), Def "answer" [] ( (Var "mult" `Apply` ( (Var "f" `Apply` Num 3.0) `Apply` Num 7.0)) `Apply` Num 5.0)]
lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
defn = ((some (kind Ident)) `and_then` ((lit "=") `xthen` (offside body))) `using` defnFN
body = (expr `and_then` (((lit "where") `xthen` (some defn)) `opt` [])) `using` bodyFN
expr = (some prim) `using` (foldl1 Apply)
prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
Haskell: Parsec, MegaParsec. ✨ OCaml: Angstrom. ✨ 🚀 Ruby: rparsec, or roll you own Elixir: Combine, ExParsec Python: Parsec. ✨