Parsing for Fun and Profit

Preview:

DESCRIPTION

Slides from my talk Parsing for Fun and Profit, code is available here: https://github.com/patchspace/parsing_for_fun_and_profit

Citation preview

Parsingfor Fun and Profit(but mainly fun)

Ash Moranash.moran@patchspace.co.uk

PatchSpace LtdSaturday, 23 February 13

What?

Saturday, 23 February 13

Parsing

Adding structure and meaning to text

Saturday, 23 February 13

Parsing Human Languages

Jake stretched his legs“Jake”, “stretched”, “his”, “legs”“Jake”<noun>, “stretched”<verb, past>, “his”<possessive pronoun>, “legs”<noun>“Jake” <noun, subject>, “stretched”, (“his”, “legs”)<noun phrase, object>

Saturday, 23 February 13

Parsing Computer Languages

“foo = bar + 123”“foo”, “=”, “bar”, “+”, “123”“foo”<var>, “=”<assignment_op>, “bar”<var>, “+”<op_plus>, “123”<int_literal>

Saturday, 23 February 13

Why?

Saturday, 23 February 13

Not just compiling!Compilers breathe fire.

Saturday, 23 February 13

Pretty PrintingSaturday, 23 February 13

Pretty Printing

gofmt

http://gofmt.com/

Saturday, 23 February 13

Code Smell Detectorshttps://rubygems.org/gems/reek

Saturday, 23 February 13

Code Smell DetectorsSaturday, 23 February 13

Other ideasCode metricsBug detectorsDomain-specific languagesLanguage translators (e.g. Ruby -> PHP)Code obfuscatorsAlternative syntaxes (e.g. CoffeeScript)Refactoring tools

Saturday, 23 February 13

How?

Saturday, 23 February 13

Step 13 year computer science

degree

Saturday, 23 February 13

Lexing/Tokenising

if x > 100 then return “big” else return “small”if x > 100 then return “big” else return “small”

Saturday, 23 February 13

Tree Buildingif x > 100 then return “big” else return a + b

if

x

>

100

then

return

“big”

else

return

a+

b

Saturday, 23 February 13

Parsing Expression Grammars

Like regular expressions, but can handle recursion, e.g. HTMLNot actually that much harder to use

Saturday, 23 February 13

Regexes and HTML

Saturday, 23 February 13

Treetop PEG grammarSaturday, 23 February 13

Doing Sums

Saturday, 23 February 13

Switch to Sublime Text, idiot

Code is now available:https://github.com/patchspace/parsing_for_fun_and_profit/

Saturday, 23 February 13

A Ruby Syntax Highlighter

Saturday, 23 February 13

What

A tool to read in simple Ruby source and output syntax highlighted HTML

Saturday, 23 February 13

Why

Because I thought it would be funIt wasBecause I thought it would be easy…

Saturday, 23 February 13

Why

Saturday, 23 February 13

HowBuild a parse tree of the Ruby sourceWalk the tree and spit out a <span> element for each bit of textOh yes, make sure each line goes in <div> and <pre> tagsWrap it in <html>And for bonus points, do some fancy method highlighting

Saturday, 23 February 13

Switch to Chrome, idiot

Saturday, 23 February 13

Switch to Sublime Text again, idiot

Code is now available:https://github.com/patchspace/parsing_for_fun_and_profit/

Saturday, 23 February 13

We’re doing this the hard way

Ruby’s grammar is too complex and undefined to easily implement as a PEGTools for parsing Ruby already exist

Saturday, 23 February 13

Ripper (Ruby 1.9.3)Saturday, 23 February 13

Learn more!

Skip theoretical physics, start by playing with Lego

Saturday, 23 February 13

Do moreIdeas you might like to try:

CSV parserJSON parser (return arrays & hashes)XML parserJSON highlighterA simple JavaScript minifier (just kill whitespace)

Saturday, 23 February 13

Thank you

Ash Moranash.moran@patchspace.co.uk

PatchSpace LtdSaturday, 23 February 13