fosskers/parcom: Simple parser combinators for Common Lisp.

by oqtey
fosskers/parcom: Simple parser combinators for Common Lisp.

parcom is a consise Parser Combinator library in the style of Haskell’s parsec
and Rust’s nom.

(in-package :parcom)
(parse (*> (string "Tempus") #'space (string "fugit")) "Tempus fugit.")
fugit

parcom operates strictly on strings, not streamed byte data, but is otherwise
“zero copy” in that extracted substrings of the original input are not
reallocated.

parcom has no dependencies.

Compiler Status
SBCL
ECL
Clasp
ABCL
CCL
Clisp
Allegro
LispWorks

The examples below use (in-package :parcom) for brevity, but it’s assumed that
you’ll use a local nickname like pc in your actual code. Further, most examples
run the parsers with parse, but occasionally funcall is used instead to
demonstrate what the remaining input would be after the parse succeeded. You
will generally be using parse in your own code.

Types and Running Parsers

All parsers have the function signature string -> parser, where parser is a
struct that holds the value of the parsing success alongside the remaining input
string.

(in-package :parcom)
(funcall (string "Hello") "Hello there")
#S(PARSER :INPUT " there" :VALUE "Hello")

Of course, a parser might fail, in which case a failure struct is returned:

(in-package :parcom)
(funcall (string "Hello") "Bye!")
#S(FAILURE :EXPECTED "string: Hello" :ACTUAL "string: Bye!")

In general though we call parse to fully run some combined parsers and yield
the final output:

(in-package :parcom)
(apply #'+ (parse (sep (char #\.) #'unsigned) "123.456.789!"))
1368

parse otherwise ignores any final, unconsumed input. It will also raise a
Condition if the parsing failed.

A “parser” is a function that consumes some specific input and yields a single
result.

Parse a given character.

(in-package :parcom)
(parse (char #\a) "apple")
#\a

Parse the given string.

(in-package :parcom)
(parse (string "Hello") "Hello there!")
Hello

Parse any character.

(in-package :parcom)
(parse #'any "Hello there!")
#\H

Parse any character expect the one you don’t want.

(in-package :parcom)
(parse (anybut #\!) "Hello there!")
#\H
(in-package :parcom)
(funcall (anybut #\H) "Hello there!")
#S(FAILURE :EXPECTED "anybut: not H" :ACTUAL "Hello there!")

Parse a hex character of any case.

(in-package :parcom)
(funcall (many #'hex) "abcd0efgh")
#S(PARSER :INPUT "gh" :VALUE (#\a #\b #\c #\d #\0 #\e #\f))

Recognize the end of the input.

(in-package :parcom)
(parse #'eof "")
T
(in-package :parcom)
(parse (*> (string "Mālum") #'eof) "Mālum")
T
(in-package :parcom)
(funcall (*> (string "Mālum") #'eof) "Mālum rubrum")
#S(FAILURE :EXPECTED "the end of the input" :ACTUAL " rubrum")

Parse a positive integer into a fixnum.

(in-package :parcom)
(parse #'unsigned "44")
44

Parse a positive or negative integer into a fixnum.

(in-package :parcom)
(parse #'integer "-44")
-44

Parse a positive or negative floating point number into a float.

(in-package :parcom)
(parse #'float "123.0456")
123.0456

Matches a single newline character.

(in-package :parcom)
(let ((s (concatenate 'cl:string '(#\newline #\a #\b #\c)))) ; "\nabc"
  (parse #'newline s))
#\Newline

Parse 0 or more ASCII whitespace and tab characters.

(in-package :parcom)
(length (parse #'space "   Salvē!"))
3

Parse 1 or more ASCII whitespace and tab characters.

(in-package :parcom)
(length (parse #'space1 "   Salvē!"))
3
(in-package :parcom)
(funcall #'space1 "Salvē!")
#S(FAILURE :EXPECTED "space1: at least one whitespace" :ACTUAL "Salvē!")

Parse 0 or more ASCII whitespace, tabs, newlines, and carriage returns.

(in-package :parcom)
(length (parse #'multispace (concatenate 'cl:string '(#\tab #\newline #\tab))))
3

Parse 1 or more ASCII whitespace, tabs, newlines, and carriage returns.

(in-package :parcom)
(length (parse #'multispace1 (concatenate 'cl:string '(#\tab #\newline #\tab))))
3
(in-package :parcom)
(funcall #'multispace1 "Ārcus")
#S(FAILURE
   :EXPECTED "multispace1: at least one space-like character"
   :ACTUAL "Ārcus")

These always yield a substring borrowed directly from the original input.

Take n characters from the input.

(in-package :parcom)
(parse (take 3) "Arbor")
Arb

Take characters while some predicate holds.

(in-package :parcom)
(parse (take-while (lambda (c) (equal #\a c))) "aaabbb")
aaa

take-while1 is like take-while, but must yield at least one character.

(in-package :parcom)
(funcall (take-while1 (lambda (c) (equal #\a c))) "bbb")
#S(FAILURE :EXPECTED "take-while1: at least one success" :ACTUAL "bbb")

Consume the rest of the input. Always succeeds.

(in-package :parcom)
(parse (<*> (string "Salvē") (*> #'space #'rest)) "Salvē domine!")
("Salvē" "domine!")

“Combinators” combine child parsers together to form compound results. They
allow us to express intent like “parse this then that” and “parse this, then
maybe that, but only if…” etc.

Run multiple parsers one after another, but yield the value of the rightmost
one. right is an alias.

(in-package :parcom)
(funcall (*> (char #\!) #'unsigned) "!123?")
#S(PARSER :INPUT "?" :VALUE 123)

Run multiple parsers one after another, but yield the value of the leftmost
one. left is an alias.

(in-package :parcom)
(funcall (<* (char #\!) #'unsigned) "!123?")
#S(PARSER :INPUT "?" :VALUE #\!)

Combination of parsers yielding all results as a list. all is an alias.

(in-package :parcom)
(parse (<*> #'unsigned (char #\!) #'unsigned) "123!456")
(123 #\! 456)

This library does not offer a currying mechanism, so the technique usually
available in Haskell of fmap’ing a function over chain of <*> must be done
instead with apply:

(in-package :parcom)
(apply #'+ (parse (<*> #'unsigned (*> (char #\!) #'unsigned)) "123!456"))
579

Run some parser, but substitute its inner value with something else if parsing
was successful. instead is an alias.

(in-package :parcom)
(parse (<$ :roma (string "Roma")) "Roma!")
:ROMA

Accept the results of the first parser from a group to succeed. Can combine as
many parsers as you want.

(in-package :parcom)
(parse (alt (string "dog") (string "cat")) "cat")
cat

Yield nil if the parser failed, but don’t fail the whole process nor consume any
input.

(in-package :parcom)
(parse (opt (string "Ex")) "Exercitus")
Ex
(in-package :parcom)
(parse (opt (string "Ex")) "Facēre")
NIL

A main parser flanked by two other ones. Only the value of the main parser is
kept. Good for parsing backets, parentheses, etc.

(in-package :parcom)
(parse (between (char #\!) (string "Salvē") (char #\!)) "!Salvē!")
Salvē

many parses 0 or more occurrences of a parser. many1 demands that at least one
parse succeeds or a Condition will be raised.

(in-package :parcom)
(parse (many (alt (string "ovēs") (string "avis"))) "ovēsovēsavis!")
("ovēs" "ovēs" "avis")

sep parses 0 or more instances of a parser separated by some sep parser. sep1
demands that at least one parse succeeds or a Condition will be raised.

(in-package :parcom)
(parse (sep (char #\!) (string "pilum")) "pilum!pilum!pilum.")
("pilum" "pilum" "pilum")

Critically, if a separator is detected, the parent parser must also then succeed
or the entire combination fails. For example, this will not parse due to the !
on the end:

(in-package :parcom)
(parse (sep (char #\!) (string "pilum")) "pilum!pilum!pilum!")

For more lenient behaviour regarding the separator, see sep-end.

The same as sep, but the separator may appear at the end of the final “parent”.
Likewise, sep-end1 demands that at least one parse of the parent succeeds.

(in-package :parcom)
(funcall (sep-end (char #\!) (string "pilum")) "pilum!pilum!pilum!scūtum")
#S(PARSER :INPUT "scūtum" :VALUE ("pilum" "pilum" "pilum"))

Parse some parser 0 or more times, but throw away all the results.

(in-package :parcom)
(parse (*> (skip (char #\!)) #'unsigned) "!!!123")
123

Yield the value of a parser, but don’t consume the input.

(in-package :parcom)
(funcall (peek (string "he")) "hello")
#S(PARSER :INPUT "hello" :VALUE "he")

Apply a parser a given number of times and collect the results as a list.

(in-package :parcom)
(funcall (count 3 (char #\a)) "aaaaaa")
#S(PARSER :INPUT "aaa" :VALUE (#\a #\a #\a))

If the given parser was successful, return the consumed input as a string
instead.

(in-package :parcom)
(funcall (recognize (<*> (string "hi") #'unsigned)) "hi123there")
#S(PARSER :INPUT "there" :VALUE "hi123")

Is a given string empty?

(in-package :parcom)
(empty? "")
T

Is a given character a number from 0 to 9?

(in-package :parcom)
(digit? #\7)
T

Apply a pure function to the inner contents of a parser.

(in-package :parcom)
(fmap #'1+ (funcall #'unsigned "1"))
#S(PARSER :INPUT "" :VALUE 2)

Yield a function that ignores its input and returns some original seed.

(in-package :parcom)
(funcall (const 1) 5)
1

By depending on the optional parcom/json system, you can parse simple JSON or
include parcom-compatible JSON parsers into your own custom parsing code.

(in-package :parcom/json) is used below for brevity, but it’s assumed that in
your own code you will use a nickname, perhaps pj.

If you don’t care about the individual parsers per se and just want to simply
parse some JSON, use parse.

Conversions:

JSON Lisp
true T
false NIL
Array Vector
Object Hash Table
Number double-float
String String
null :NULL

As with the parent parcom library, parcom/json works strictly off of strings and
makes no attempt to be clever or high-performance. For a more “industrial
strength” JSON parsing library, see jzon. The strength of parcom/json is in its
simplicity and light weight.

Attempt to parse any JSON value. Analogous to parse from the main library.

(in-package :parcom/json)
(parse "{\"x\": 1, \"y\": 2, \"z\": [1, {\"a\":true}]}")
#
(in-package :parcom/json)
(parse "[1.9,true,3e+7,\"hi\",[4],null]")
#(1.9d0 T 3.0d7 "hi" #(4) :NULL)

Non-ascii and unicode characters are supported:

(in-package :parcom/json)
(parse "\"hēllお🐂\\u03B1\"")
hēllお🐂α

Parse any kind of JSON (the actual parser).

(in-package :parcom/json)
(json "{\"x\": 1, \"y\": 2, \"z\": [1, {\"a\":true}]}  ")
#S(P:PARSER :INPUT "  " :VALUE #)

There are other subparsers exposed, but they are left out here for brevity.
Please consult the source code if you need them.

The whole point of Parser Combinators is that it becomes simple to write your
own parsing functions. Recall that a “fully realized” parser has the signature
string -> parser. In the simplest case, a parser of yours could look like:

(in-package :parcom)

(defun excited-apple (input)
  (funcall (<* (string "Mālum") (char #\!)) input))

(funcall #'excited-apple "Mālum! Ō!")
#S(PARSER :INPUT " Ō!" :VALUE "Mālum")

Wherein you utilize the combinators provided by this library to build up
composite parsers that are useful to you.

You can also parameterize your parsers, similar to parsers like take or
combinators like count:

(in-package :parcom)

(defun excited-apple (input)
  (funcall (<* (string "Mālum") (char #\!)) input))

(defun excited-apples (n)
  "Parse a certain number of excited apples."
  (lambda (input)
    (funcall (count #'excited-apple n) input)))

(funcall (excited-apples 3) "Mālum!Mālum!Mālum!Mālum!")
#S(PARSER :INPUT "Mālum!" :VALUE ("Mālum" "Mālum" "Mālum"))

So, if your parser is parameterized by some initial argument, it has to return a
lambda that accepts an input string.

You can use fail within more complex hand-written parsers to explicitly fail
with your own diagnostics:

(in-package :parcom)

(defun three-sad-pears (input)
  (let ((res (funcall (many (string "Pirum trīste")) input)))
    (cond ((failure-p res)
           (fail "Three sad pears" "No pears at all!"))
          ((< (length (parser-value res)) 3)
           (fail "Three sad pears" "Not enough pears"))
          ((> (length (parser-value res)) 3)
           (fail "Three sad pears" "Way too many pears"))
          (t res))))

(three-sad-pears "Pirum trīste")
#S(FAILURE :EXPECTED "Three sad pears" :ACTUAL "Not enough pears")

Notice the usage of parser-value to access the current inner success value of
the parser result. parser-input is likewise used to access the remaining input.

Related Posts

Leave a Comment