Transcript
Page 1: Understanding parser combinators

Understanding Parser

Combinators

@ScottWlaschin

fsharpforfunandprofit.com/parser

Page 2: Understanding parser combinators

let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

let point = pchar '.'

let e = pchar 'e' <|> pchar 'E'

let optPlusMinus = opt (pchar '-' <|> pchar '+')

let nonZeroInt =

digitOneNine .>>. manyChars digit

|>> fun (first,rest) -> string first + rest

let intPart = zero <|> nonZeroInt

let fractionPart = point >>. manyChars1 digit

let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

Typical code using parser combinators

Page 3: Understanding parser combinators

let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

let point = pchar '.'

let e = pchar 'e' <|> pchar 'E'

let optPlusMinus = opt (pchar '-' <|> pchar '+')

let nonZeroInt =

digitOneNine .>>. manyChars digit

|>> fun (first,rest) -> string first + rest

let intPart = zero <|> nonZeroInt

let fractionPart = point >>. manyChars1 digit

let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

Page 4: Understanding parser combinators

Overview

1. What is a parser combinator library?

2. The foundation: a simple parser

3. Three basic parser combinators

4. Building combinators from other combinators

5. Improving the error messages

6. Building a JSON parser

Page 5: Understanding parser combinators

Part 1

What is a parser combinator

library?

Page 6: Understanding parser combinators

Something to match

Parser<something> Create step in parsing recipe

Creating a parsing recipe

A “Parser-making" function

This is a recipe to make something, not the thing

itself

Page 7: Understanding parser combinators

Parser<thingC>

Combining parsing recipes

A recipe to make a more complicated thing

Parser<thingA> Parser<thingB> combined

with

A "combinator"

Page 8: Understanding parser combinators

Parser<something> Run

Running a parsing recipe

input

Success or

Failure

Page 9: Understanding parser combinators

Why parser combinators?

• Written in your favorite programming language

• No preprocessing needed

– Lexing, parsing, AST transform all in one.

– REPL-friendly

• Easy to create little DSLs

– Google "fogcreek fparsec"

• Fun way of understanding functional composition

Page 10: Understanding parser combinators

Part 2:

A simple parser

Page 11: Understanding parser combinators

Version 1 – parse the character 'A'

input

pcharA remaining input

true/false

Page 12: Understanding parser combinators

Version 1 – parse the character 'A'

input

pcharA remaining input

true/false

Page 13: Understanding parser combinators

let pcharA input =

if String.IsNullOrEmpty(input) then

(false,"")

else if input.[0] = 'A' then

let remaining = input.[1..]

(true,remaining)

else

(false,input)

Page 14: Understanding parser combinators

Version 2 – parse any character

matched char input

pchar remaining input

charToMatch failure message

Page 15: Understanding parser combinators

let pchar (charToMatch,input) =

if String.IsNullOrEmpty(input) then

"No more input"

else

let first = input.[0]

if first = charToMatch then

let remaining = input.[1..]

(charToMatch,remaining)

else

sprintf "Expecting '%c'. Got '%c'" charToMatch first

Page 16: Understanding parser combinators

Fix – create a choice type to capture either case

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

type Result<'a> =

| Success of 'a

| Failure of string

Page 17: Understanding parser combinators

Fix – create a choice type to capture either case

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

type Result<'a> =

| Success of 'a

| Failure of string

Page 18: Understanding parser combinators

Fix – create a choice type to capture either case

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

type Result<'a> =

| Success of 'a

| Failure of string

Page 19: Understanding parser combinators

let pchar (charToMatch,input) =

if String.IsNullOrEmpty(input) then

Failure "No more input"

else

let first = input.[0]

if first = charToMatch then

let remaining = input.[1..]

Success (charToMatch,remaining)

else

let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch first

Failure msg

Page 20: Understanding parser combinators

Version 3 – returning a function

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

Page 21: Understanding parser combinators

Version 3 – returning a function

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

Page 22: Understanding parser combinators

Version 3 – returning a function

input pchar

charToMatch

Page 23: Understanding parser combinators

Version 3 – returning a function

charToMatch

pchar

Page 24: Understanding parser combinators

Version 3 – returning a function

charToMatch

pchar

Page 25: Understanding parser combinators

Version 4 – wrapping the function in a type

charToMatch

pchar Parser<char>

Page 26: Understanding parser combinators

Version 4 – wrapping the function in a type

charToMatch

pchar Parser<char>

type Parser<'a> = Parser of (string -> Result<'a * string>)

A function that takes a string and returns a Result

Page 27: Understanding parser combinators

Version 4 – wrapping the function in a type

charToMatch

pchar Parser<char>

type Parser<'a> = Parser of (string -> Result<'a * string>)

Wrapper

Page 28: Understanding parser combinators

Creating parsing recipes

Page 29: Understanding parser combinators

charToMatch input

Parser<char>

A parsing recipe for a char

Page 30: Understanding parser combinators

Parser<something> Run

Running a parsing recipe

input

Success, or

Failure

Page 31: Understanding parser combinators

Running a parsing recipe

input

Parser<something>

Parser<something> Run

input

Success, or

Failure

Page 32: Understanding parser combinators

let run parser input =

// unwrap parser to get inner function

let (Parser innerFn) = parser

// call inner function with input

innerFn input

Page 33: Understanding parser combinators

Enough talk,

show me some code

Page 34: Understanding parser combinators

Part 3:

Three basic combinators

Page 35: Understanding parser combinators

What is a combinator?

• A “combinator” library is a library designed around

combining things to get more complex values of

the same type.

• integer + integer = integer

• list @ list = list // @ is list concat

• Parser ?? Parser = Parser

Page 36: Understanding parser combinators

Basic parser combinators

• Parser andThen Parser => Parser

• Parser orElse Parser => Parser

• Parser map (transformer) => Parser

Page 37: Understanding parser combinators

AndThen parser combinator

• Run the first parser.

– If there is a failure, return.

• Otherwise, run the second parser with the

remaining input.

– If there is a failure, return.

• If both parsers succeed, return a pair (tuple)

that contains both parsed values.

Page 38: Understanding parser combinators

let andThen parser1 parser2 =

let innerFn input =

// run parser1 with the input

let result1 = run parser1 input

// test the 1st parse result for Failure/Success

match result1 with

| Failure err ->

Failure err // return error from parser1

| Success (value1,remaining1) ->

// run parser2 with the remaining input

(continued on next slide..)

Page 39: Understanding parser combinators

let andThen parser1 parser2 =

[...snip...]

let result2 = run parser2 remaining1

// test the 2nd parse result for Failure/Success

match result2 with

| Failure err ->

Failure err // return error from parser2

| Success (value2,remaining2) ->

let combinedValue = (value1,value2)

Success (combinedValue,remaining2)

// return the inner function

Parser innerFn

Page 40: Understanding parser combinators

OrElse parser combinator

• Run the first parser.

• On success, return the parsed value, along

with the remaining input.

• Otherwise, on failure, run the second parser

with the original input...

• ...and in this case, return the result (success or

failure) from the second parser.

Page 41: Understanding parser combinators

let orElse parser1 parser2 =

let innerFn input =

// run parser1 with the input

let result1 = run parser1 input

// test the result for Failure/Success

match result1 with

| Success result ->

// if success, return the original result

result1

| Failure err ->

// if failed, run parser2 with the input

(continued on next slide..)

Page 42: Understanding parser combinators

let orElse parser1 parser2 =

[...snip...]

| Failure err ->

// if failed, run parser2 with the input

let result2 = run parser2 input

// return parser2's result

result2

// return the inner function

Parser innerFn

Page 43: Understanding parser combinators

Map parser combinator

• Run the parser.

• On success, transform the parsed value using

the provided function.

• Otherwise, return the failure

Page 44: Understanding parser combinators

let mapP f parser =

let innerFn input =

// run parser with the input

let result = run parser input

// test the result for Failure/Success

match result with

| Success (value,remaining) ->

// if success, return the value transformed by f

let newValue = f value

Success (newValue, remaining)

(continued on next slide..)

Page 45: Understanding parser combinators

let mapP f parser =

[...snip...]

| Failure err ->

// if failed, return the error

Failure err

// return the inner function

Parser innerFn

Page 46: Understanding parser combinators

Parser combinator operators

pcharA .>>. pcharB // 'A' andThen 'B'

pcharA <|> pcharB // 'A' orElse 'B'

pcharA |>> (...) // map ch to something

Page 47: Understanding parser combinators

Demo

Page 48: Understanding parser combinators

Part 4:

Building complex combinators from

these basic ones

Page 49: Understanding parser combinators

[ 1; 2; 3] |> List.reduce (+)

// 1 + 2 + 3

[ pcharA; pcharB; pcharC] |> List.reduce ( .>>. )

// pcharA .>>. pcharB .>>. pcharC

[ pcharA; pcharB; pcharC] |> List.reduce ( <|> )

// pcharA <|> pcharB <|> pcharC

Using reduce to combine parsers

Page 50: Understanding parser combinators

let choice listOfParsers =

listOfParsers |> List.reduce ( <|> )

let anyOf listOfChars =

listOfChars

|> List.map pchar // convert char into Parser<char>

|> choice // combine them all

let parseLowercase = anyOf ['a'..'z']

let parseDigit = anyOf ['0'..'9']

Using reduce to combine parsers

Page 51: Understanding parser combinators

/// Convert a list of parsers into a Parser of list

let sequence listOfParsers =

let concatResults p1 p2 = // helper

p1 .>>. p2

|>> (fun (list1,list2) -> list1 @ list2)

listOfParsers

// map each parser result to a list

|> Seq.map (fun parser -> parser |>> List.singleton)

// reduce by concatting the results of AndThen

|> Seq.reduce concatResults

Using reduce to combine parsers

Page 52: Understanding parser combinators

/// match a specific string

let pstring str =

str

// map each char to a pchar

|> Seq.map pchar

// convert to Parser<char list>

|> sequence

// convert Parser<char list> to Parser<char array>

|>> List.toArray

// convert Parser<char array> to Parser<string>

|>> String

Using reduce to combine parsers

Page 53: Understanding parser combinators

Demo

Page 54: Understanding parser combinators

Yet more combinators

Page 55: Understanding parser combinators

“More than one” combinators

let many p = ... // zero or more

let many1 p = ... // one or more

let opt p = ... // zero or one

// example

let whitespaceChar = anyOf [' '; '\t'; '\n']

let whitespace = many1 whitespaceChar

Page 56: Understanding parser combinators

“Throwing away” combinators p1 .>> p2 // throw away right side

p1 >>. p2 // throw away left side

// keep only the inside value

let between p1 p2 p3 = p1 >>. p2 .>> p3

// example

let pdoublequote = pchar '"'

let quotedInt = between pdoublequote pint pdoublequote

Page 57: Understanding parser combinators

“Separator” combinators

let sepBy1 p sep = ... /// one or more p separated by sep

let sepBy p sep = ... /// zero or more p separated by sep

// example

let comma = pchar ','

let digit = anyOf ['0'..'9']

let oneOrMoreDigitList = sepBy1 digit comma

Page 58: Understanding parser combinators

Demo

Page 59: Understanding parser combinators

Part 5:

Improving the error messages

Page 60: Understanding parser combinators

input

Parser<char>

Named parsers

Name: “Digit”

Parsing Function:

Page 61: Understanding parser combinators

Named parsers

let ( <?> ) = setLabel // infix version

run parseDigit "ABC" // without the label

// Error parsing "9" : Unexpected 'A'

let parseDigit_WithLabel = anyOf ['0'..'9'] <?> "digit"

run parseDigit_WithLabel "ABC" // with the label

// Error parsing "digit" : Unexpected 'A'

Page 62: Understanding parser combinators

input

Parser<char>

Extra input context

Input: * Stream of characters * Line, Column

Page 63: Understanding parser combinators

Extra input context

run pint "-Z123"

// Line:0 Col:1 Error parsing integer

// -Z123

// ^Unexpected 'Z'

run pfloat "-123Z45"

// Line:0 Col:4 Error parsing float

// -123Z45

// ^Unexpected 'Z'

Page 64: Understanding parser combinators

Part 6:

Building a JSON Parser

Page 65: Understanding parser combinators
Page 66: Understanding parser combinators

// A type that represents the previous diagram

type JValue =

| JString of string

| JNumber of float

| JObject of Map<string, JValue>

| JArray of JValue list

| JBool of bool

| JNull

Page 67: Understanding parser combinators
Page 68: Understanding parser combinators

Parsing JSON Null

Page 69: Understanding parser combinators

// new helper operator.

let (>>%) p x =

p |>> (fun _ -> x) // runs parser p, but ignores the result

// Parse a "null"

let jNull =

pstring "null"

>>% JNull // map to JNull

<?> "null" // give it a label

Page 70: Understanding parser combinators

Parsing JSON Bool

Page 71: Understanding parser combinators

// Parse a boolean

let jBool =

let jtrue = pstring "true"

>>% JBool true // map to JBool

let jfalse = pstring "false"

>>% JBool false // map to JBool

// choose between true and false

jtrue <|> jfalse

<?> "bool" // give it a label

Page 72: Understanding parser combinators

Parsing a JSON String

Page 73: Understanding parser combinators
Page 74: Understanding parser combinators

Call this "unescaped char"

Page 75: Understanding parser combinators

/// Parse an unescaped char

let jUnescapedChar =

let label = "char"

satisfy (fun ch -> (ch <> '\\') && (ch <> '\"') ) label

Page 76: Understanding parser combinators

Call this "escaped char"

Page 77: Understanding parser combinators

let jEscapedChar =

[ // each item is (stringToMatch, resultChar)

("\\\"",'\"') // quote

("\\\\",'\\') // reverse solidus

("\\/",'/') // solidus

("\\b",'\b') // backspace

("\\f",'\f') // formfeed

("\\n",'\n') // newline

("\\r",'\r') // cr

("\\t",'\t') // tab

]

// convert each pair into a parser

|> List.map (fun (toMatch,result) -> pstring toMatch >>% result)

// and combine them into one

|> choice

<?> "escaped char" // set label

Page 78: Understanding parser combinators

Call this "unicode char"

Page 79: Understanding parser combinators

"unescaped char" or

"escaped char" or

"unicode char"

Page 80: Understanding parser combinators

let quotedString =

let quote = pchar '\"' <?> "quote"

let jchar =

jUnescapedChar <|> jEscapedChar <|> jUnicodeChar

// set up the main parser

quote >>. manyChars jchar .>> quote

let jString =

// wrap the string in a JString

quotedString

|>> JString // convert to JString

<?> "quoted string" // add label

Page 81: Understanding parser combinators

Parsing a JSON Number

Page 82: Understanding parser combinators
Page 83: Understanding parser combinators

"int part"

"sign part"

Page 84: Understanding parser combinators

let optSign = opt (pchar '-')

let zero = pstring "0"

let digitOneNine =

satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"

let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

let nonZeroInt =

digitOneNine .>>. manyChars digit

|>> fun (first,rest) -> string first + rest

// set up the integer part

let intPart = zero <|> nonZeroInt

Page 85: Understanding parser combinators

"fraction part"

Page 86: Understanding parser combinators

// set up the fraction part

let point = pchar '.'

let fractionPart =

point >>. manyChars1 digit

Page 87: Understanding parser combinators

"exponent part"

Page 88: Understanding parser combinators

// set up the exponent part

let e = pchar 'e' <|> pchar 'E'

let optPlusMinus = opt (pchar '-' <|> pchar '+')

let exponentPart =

e >>. optPlusMinus .>>. manyChars1 digit

Page 89: Understanding parser combinators

"exponent part"

"int part"

"fraction part" "sign part"

Page 90: Understanding parser combinators

// set up the main JNumber parser

optSign

.>>. intPart

.>>. opt fractionPart

.>>. opt exponentPart

|>> convertToJNumber // not shown

<?> "number" // add label

Page 91: Understanding parser combinators

Parsing JSON Arrays and Objects

Page 92: Understanding parser combinators

Completing the JSON Parser

Page 93: Understanding parser combinators
Page 94: Understanding parser combinators

// the final parser combines the others together

let jValue = choice

[

jNull

jBool

jNumber

jString

jArray

jObject

]

Page 95: Understanding parser combinators

Demo: the JSON parser in action

Page 96: Understanding parser combinators

Summary • Treating a function like an object

– Returning a function from a function

– Wrapping a function in a type

• Working with a "recipe" (aka "effect")

– Combining recipes before running them.

• The power of combinators

– A few basic combinators: "andThen", "orElse", etc.

– Complex parsers are built from smaller components.

• Combinator libraries are small but powerful

– Less than 500 lines for combinator library

– Less than 300 lines for JSON parser itself

Page 97: Understanding parser combinators

Want more? • For a production-ready library for F#,

search for "fparsec"

• There are similar libraries for other languages

Page 98: Understanding parser combinators

Thanks!

@ScottWlaschin

fsharpforfunandprofit.com/parser

Contact me

Slides and video here

Let us know if you need help with F#


Recommended