Understanding Parser
Combinators
@ScottWlaschin
fsharpforfunandprofit.com/parser
let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"
let point = pchar '.'
let e = pchar 'e' <|> pchar 'E'
let optPlusMinus = opt (pchar '-' <|> pchar '+')
let nonZeroInt =
digitOneNine .>>. manyChars digit
|>> fun (first,rest) -> string first + rest
let intPart = zero <|> nonZeroInt
let fractionPart = point >>. manyChars1 digit
let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit
Typical code using parser combinators
let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"
let point = pchar '.'
let e = pchar 'e' <|> pchar 'E'
let optPlusMinus = opt (pchar '-' <|> pchar '+')
let nonZeroInt =
digitOneNine .>>. manyChars digit
|>> fun (first,rest) -> string first + rest
let intPart = zero <|> nonZeroInt
let fractionPart = point >>. manyChars1 digit
let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit
Overview
1. What is a parser combinator library?
2. The foundation: a simple parser
3. Three basic parser combinators
4. Building combinators from other combinators
5. Improving the error messages
6. Building a JSON parser
Part 1
What is a parser combinator
library?
Something to match
Parser<something> Create step in parsing recipe
Creating a parsing recipe
A “Parser-making" function
This is a recipe to make something, not the thing
itself
Parser<thingC>
Combining parsing recipes
A recipe to make a more complicated thing
Parser<thingA> Parser<thingB> combined
with
A "combinator"
Parser<something> Run
Running a parsing recipe
input
Success or
Failure
Why parser combinators?
• Written in your favorite programming language
• No preprocessing needed
– Lexing, parsing, AST transform all in one.
– REPL-friendly
• Easy to create little DSLs
– Google "fogcreek fparsec"
• Fun way of understanding functional composition
Part 2:
A simple parser
Version 1 – parse the character 'A'
input
pcharA remaining input
true/false
Version 1 – parse the character 'A'
input
pcharA remaining input
true/false
let pcharA input =
if String.IsNullOrEmpty(input) then
(false,"")
else if input.[0] = 'A' then
let remaining = input.[1..]
(true,remaining)
else
(false,input)
Version 2 – parse any character
matched char input
pchar remaining input
charToMatch failure message
let pchar (charToMatch,input) =
if String.IsNullOrEmpty(input) then
"No more input"
else
let first = input.[0]
if first = charToMatch then
let remaining = input.[1..]
(charToMatch,remaining)
else
sprintf "Expecting '%c'. Got '%c'" charToMatch first
Fix – create a choice type to capture either case
Success: matched char input
pchar Success: remaining input
charToMatch Failure: message
type Result<'a> =
| Success of 'a
| Failure of string
Fix – create a choice type to capture either case
Success: matched char input
pchar Success: remaining input
charToMatch Failure: message
type Result<'a> =
| Success of 'a
| Failure of string
Fix – create a choice type to capture either case
Success: matched char input
pchar Success: remaining input
charToMatch Failure: message
type Result<'a> =
| Success of 'a
| Failure of string
let pchar (charToMatch,input) =
if String.IsNullOrEmpty(input) then
Failure "No more input"
else
let first = input.[0]
if first = charToMatch then
let remaining = input.[1..]
Success (charToMatch,remaining)
else
let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch first
Failure msg
Version 3 – returning a function
Success: matched char input
pchar Success: remaining input
charToMatch Failure: message
Version 3 – returning a function
Success: matched char input
pchar Success: remaining input
charToMatch Failure: message
Version 3 – returning a function
input pchar
charToMatch
Version 3 – returning a function
charToMatch
pchar
Version 3 – returning a function
charToMatch
pchar
Version 4 – wrapping the function in a type
charToMatch
pchar Parser<char>
Version 4 – wrapping the function in a type
charToMatch
pchar Parser<char>
type Parser<'a> = Parser of (string -> Result<'a * string>)
A function that takes a string and returns a Result
Version 4 – wrapping the function in a type
charToMatch
pchar Parser<char>
type Parser<'a> = Parser of (string -> Result<'a * string>)
Wrapper
Creating parsing recipes
charToMatch input
Parser<char>
A parsing recipe for a char
Parser<something> Run
Running a parsing recipe
input
Success, or
Failure
Running a parsing recipe
input
Parser<something>
Parser<something> Run
input
Success, or
Failure
let run parser input =
// unwrap parser to get inner function
let (Parser innerFn) = parser
// call inner function with input
innerFn input
Enough talk,
show me some code
Part 3:
Three basic combinators
What is a combinator?
• A “combinator” library is a library designed around
combining things to get more complex values of
the same type.
• integer + integer = integer
• list @ list = list // @ is list concat
• Parser ?? Parser = Parser
Basic parser combinators
• Parser andThen Parser => Parser
• Parser orElse Parser => Parser
• Parser map (transformer) => Parser
AndThen parser combinator
• Run the first parser.
– If there is a failure, return.
• Otherwise, run the second parser with the
remaining input.
– If there is a failure, return.
• If both parsers succeed, return a pair (tuple)
that contains both parsed values.
let andThen parser1 parser2 =
let innerFn input =
// run parser1 with the input
let result1 = run parser1 input
// test the 1st parse result for Failure/Success
match result1 with
| Failure err ->
Failure err // return error from parser1
| Success (value1,remaining1) ->
// run parser2 with the remaining input
(continued on next slide..)
let andThen parser1 parser2 =
[...snip...]
let result2 = run parser2 remaining1
// test the 2nd parse result for Failure/Success
match result2 with
| Failure err ->
Failure err // return error from parser2
| Success (value2,remaining2) ->
let combinedValue = (value1,value2)
Success (combinedValue,remaining2)
// return the inner function
Parser innerFn
OrElse parser combinator
• Run the first parser.
• On success, return the parsed value, along
with the remaining input.
• Otherwise, on failure, run the second parser
with the original input...
• ...and in this case, return the result (success or
failure) from the second parser.
let orElse parser1 parser2 =
let innerFn input =
// run parser1 with the input
let result1 = run parser1 input
// test the result for Failure/Success
match result1 with
| Success result ->
// if success, return the original result
result1
| Failure err ->
// if failed, run parser2 with the input
(continued on next slide..)
let orElse parser1 parser2 =
[...snip...]
| Failure err ->
// if failed, run parser2 with the input
let result2 = run parser2 input
// return parser2's result
result2
// return the inner function
Parser innerFn
Map parser combinator
• Run the parser.
• On success, transform the parsed value using
the provided function.
• Otherwise, return the failure
let mapP f parser =
let innerFn input =
// run parser with the input
let result = run parser input
// test the result for Failure/Success
match result with
| Success (value,remaining) ->
// if success, return the value transformed by f
let newValue = f value
Success (newValue, remaining)
(continued on next slide..)
let mapP f parser =
[...snip...]
| Failure err ->
// if failed, return the error
Failure err
// return the inner function
Parser innerFn
Parser combinator operators
pcharA .>>. pcharB // 'A' andThen 'B'
pcharA <|> pcharB // 'A' orElse 'B'
pcharA |>> (...) // map ch to something
Demo
Part 4:
Building complex combinators from
these basic ones
[ 1; 2; 3] |> List.reduce (+)
// 1 + 2 + 3
[ pcharA; pcharB; pcharC] |> List.reduce ( .>>. )
// pcharA .>>. pcharB .>>. pcharC
[ pcharA; pcharB; pcharC] |> List.reduce ( <|> )
// pcharA <|> pcharB <|> pcharC
Using reduce to combine parsers
let choice listOfParsers =
listOfParsers |> List.reduce ( <|> )
let anyOf listOfChars =
listOfChars
|> List.map pchar // convert char into Parser<char>
|> choice // combine them all
let parseLowercase = anyOf ['a'..'z']
let parseDigit = anyOf ['0'..'9']
Using reduce to combine parsers
/// Convert a list of parsers into a Parser of list
let sequence listOfParsers =
let concatResults p1 p2 = // helper
p1 .>>. p2
|>> (fun (list1,list2) -> list1 @ list2)
listOfParsers
// map each parser result to a list
|> Seq.map (fun parser -> parser |>> List.singleton)
// reduce by concatting the results of AndThen
|> Seq.reduce concatResults
Using reduce to combine parsers
/// match a specific string
let pstring str =
str
// map each char to a pchar
|> Seq.map pchar
// convert to Parser<char list>
|> sequence
// convert Parser<char list> to Parser<char array>
|>> List.toArray
// convert Parser<char array> to Parser<string>
|>> String
Using reduce to combine parsers
Demo
Yet more combinators
“More than one” combinators
let many p = ... // zero or more
let many1 p = ... // one or more
let opt p = ... // zero or one
// example
let whitespaceChar = anyOf [' '; '\t'; '\n']
let whitespace = many1 whitespaceChar
“Throwing away” combinators p1 .>> p2 // throw away right side
p1 >>. p2 // throw away left side
// keep only the inside value
let between p1 p2 p3 = p1 >>. p2 .>> p3
// example
let pdoublequote = pchar '"'
let quotedInt = between pdoublequote pint pdoublequote
“Separator” combinators
let sepBy1 p sep = ... /// one or more p separated by sep
let sepBy p sep = ... /// zero or more p separated by sep
// example
let comma = pchar ','
let digit = anyOf ['0'..'9']
let oneOrMoreDigitList = sepBy1 digit comma
Demo
Part 5:
Improving the error messages
input
Parser<char>
Named parsers
Name: “Digit”
Parsing Function:
Named parsers
let ( <?> ) = setLabel // infix version
run parseDigit "ABC" // without the label
// Error parsing "9" : Unexpected 'A'
let parseDigit_WithLabel = anyOf ['0'..'9'] <?> "digit"
run parseDigit_WithLabel "ABC" // with the label
// Error parsing "digit" : Unexpected 'A'
input
Parser<char>
Extra input context
Input: * Stream of characters * Line, Column
Extra input context
run pint "-Z123"
// Line:0 Col:1 Error parsing integer
// -Z123
// ^Unexpected 'Z'
run pfloat "-123Z45"
// Line:0 Col:4 Error parsing float
// -123Z45
// ^Unexpected 'Z'
Part 6:
Building a JSON Parser
// A type that represents the previous diagram
type JValue =
| JString of string
| JNumber of float
| JObject of Map<string, JValue>
| JArray of JValue list
| JBool of bool
| JNull
Parsing JSON Null
// new helper operator.
let (>>%) p x =
p |>> (fun _ -> x) // runs parser p, but ignores the result
// Parse a "null"
let jNull =
pstring "null"
>>% JNull // map to JNull
<?> "null" // give it a label
Parsing JSON Bool
// Parse a boolean
let jBool =
let jtrue = pstring "true"
>>% JBool true // map to JBool
let jfalse = pstring "false"
>>% JBool false // map to JBool
// choose between true and false
jtrue <|> jfalse
<?> "bool" // give it a label
Parsing a JSON String
Call this "unescaped char"
/// Parse an unescaped char
let jUnescapedChar =
let label = "char"
satisfy (fun ch -> (ch <> '\\') && (ch <> '\"') ) label
Call this "escaped char"
let jEscapedChar =
[ // each item is (stringToMatch, resultChar)
("\\\"",'\"') // quote
("\\\\",'\\') // reverse solidus
("\\/",'/') // solidus
("\\b",'\b') // backspace
("\\f",'\f') // formfeed
("\\n",'\n') // newline
("\\r",'\r') // cr
("\\t",'\t') // tab
]
// convert each pair into a parser
|> List.map (fun (toMatch,result) -> pstring toMatch >>% result)
// and combine them into one
|> choice
<?> "escaped char" // set label
Call this "unicode char"
"unescaped char" or
"escaped char" or
"unicode char"
let quotedString =
let quote = pchar '\"' <?> "quote"
let jchar =
jUnescapedChar <|> jEscapedChar <|> jUnicodeChar
// set up the main parser
quote >>. manyChars jchar .>> quote
let jString =
// wrap the string in a JString
quotedString
|>> JString // convert to JString
<?> "quoted string" // add label
Parsing a JSON Number
"int part"
"sign part"
let optSign = opt (pchar '-')
let zero = pstring "0"
let digitOneNine =
satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"
let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"
let nonZeroInt =
digitOneNine .>>. manyChars digit
|>> fun (first,rest) -> string first + rest
// set up the integer part
let intPart = zero <|> nonZeroInt
"fraction part"
// set up the fraction part
let point = pchar '.'
let fractionPart =
point >>. manyChars1 digit
"exponent part"
// set up the exponent part
let e = pchar 'e' <|> pchar 'E'
let optPlusMinus = opt (pchar '-' <|> pchar '+')
let exponentPart =
e >>. optPlusMinus .>>. manyChars1 digit
"exponent part"
"int part"
"fraction part" "sign part"
// set up the main JNumber parser
optSign
.>>. intPart
.>>. opt fractionPart
.>>. opt exponentPart
|>> convertToJNumber // not shown
<?> "number" // add label
Parsing JSON Arrays and Objects
Completing the JSON Parser
// the final parser combines the others together
let jValue = choice
[
jNull
jBool
jNumber
jString
jArray
jObject
]
Demo: the JSON parser in action
Summary • Treating a function like an object
– Returning a function from a function
– Wrapping a function in a type
• Working with a "recipe" (aka "effect")
– Combining recipes before running them.
• The power of combinators
– A few basic combinators: "andThen", "orElse", etc.
– Complex parsers are built from smaller components.
• Combinator libraries are small but powerful
– Less than 500 lines for combinator library
– Less than 300 lines for JSON parser itself
Want more? • For a production-ready library for F#,
search for "fparsec"
• There are similar libraries for other languages
Thanks!
@ScottWlaschin
fsharpforfunandprofit.com/parser
Contact me
Slides and video here
Let us know if you need help with F#