Packrat Parsers

Its a long weekend, so I wrote a packrat parser! (AKA Parsing Expression Grammar, or PEG.)

I implemented the example grammar from the original paper. Here its parsing the string 2*(3+4):

Holy cow thats a lot of steps for a 7 character string. No wonder modern compilers take ages to run!

Packrat parsers are neat because unlike other parsers you can combine your tokeniser and lexer into a single grammar. There's no need to define token rules separately from your semantic rules.

So for example, the definition of an array literal can simply be:

ArrayLiteral = "[" ListOf<AssignmentExpressionOrElision, ","> "]"  

(From this ES5 grammar)

If you want to learn more, here's an online version of Ohm (a clean, full featured JS packrat parser) that you can mess around with: http://www.cdglabs.org/ohm/visualizer/.

Here's a big pile of academic papers: http://bford.info/packrat/

Finally the simple code I wrote is here. My full PEG implementation including the grammar fits in less than 200 lines of code.