MaxPC API Documentation
maxpc (Package)
Max’s Parser Combinators¹ is a simple and pragmatic library for writing parsers and lexers based on combinatory parsing. MaxPC is capable of parsing deterministic, context-free languages, provides powerful tools for parse tree transformation and error handling, and can operate on sequences and streams. It supports unlimited backtracking, but does not implement Packrat Parsing. Instead, MaxPC achieves good performance through its optimized primitives, and explicit separation of matching and capturing input. In practice, MaxPC parsers perform better on typical computer languages—when compared to Packrat parsers—at the expense of not producing linear-time parsers.²
- 1. MaxPC is a complete rewrite of MPC with was in turn a fork of Drew Crampsie’s Smug.
- 2. See MaxPC: Why? How? / Packrat Parsing on why the book keeping costs of Packrat parsing diminish the gain in execution time for typical grammars and workloads.
Basic Concepts
MaxPC parsers are functions that accept an input as their only argument and return three values: the remaining input or nil
, a result value or nil
, and a generalized boolean that indicates if a result value is present. The type of a parser is:
(function (
input) (or
input null) * boolean)
A parser can either succeed or fail when applied to an input at a given position. A succeeding parser is said to match. When a parser matches it can optionally produce a result value, which may be processed by its parent parsers. New parsers can be created by composing existing parsers using built-in or user defined parser combinators. A parser combinator is a higher-order function that includes one or more parsers as its arguments and returns a parser.
By convention, all parsers are defined as higher-order functions that return the respective parser, even if they are nullary. For instance, the ?end
parser is obtained using the form “(?end)
”.
A lexical convention is used to make three different types of parsers easily distinguishable:
- Parsers whose names start with a question mark (
?
) never produce a result value. - Parsers whose names start with an equals sign (
=
) always produce a result value. - Parsers whose names start with a percent symbol (
%
) may produce a result value depending on their arguments.
Example
We will define a parser for a simplistic grammar for Email addresses of the form:
email‑address := user @
host
First we define a parser for the valid characters in the user and host parts of an address. Just for this example, we choose these to be alphanumeric characters and then some. ?address-character
uses the %or
combinator to form the union of two instances of ?satisfies
that match different sets of characters.
Then we use ?address-character
to implement our address parser which matches two sequences of “address characters” separated by an @
character, and produces a list of user and host components as its result value. We match ?address-character
one or more times using %some
, and produce the matched subsequence as the result value using =subseq
. Both parts of the address separated by an @ character are matched in sequence using =list
, whose result value is finally transformed by =destructure
.
We can now apply =email-address
using parse
:
Overview
parse
is the entry point of MaxPC
. It applies a parser to an input and returns the parser’s result value, and two boolean values indicating if the parser matched and if there is unmatched input remaining, respectively.
Basic Parsers
?end
matches only when there is no further input.=element
unconditionally matches the next element of the input sequence.=subseq
produces the subsequence matched by its argument as its result value.?satisfies
,?test
, and?eq
match input conditionally.%maybe
matches, even if its argument fails to match.
Logical Combinators
%or
forms the union of its arguments.%and
forms the intersection of its arguments.%diff
forms the difference of its arguments.?not
negates its argument.
Sequence Combinators
?seq
matches its arguments in sequence.=list
matches its arguments in sequence and produces a list of their results as its result value.%any
matches its argument repeatedly any number of times.%some
matches its argument repeatedly one or more times.
Transformation
=transform
produces the result of applying a function to its argument’s result value as its result value.=destructure
is a convenient destructuring macro on top of=transform
.
Error Handling
?fail
never matches and evaluates its body when it is called.%handler-case
and%restart-case
allow you to set up handlers and restarts across parsers.get-input-position
can be called in error situations to retrieve the current position in the input.
Caveat: Recursive Parsers
A recursive parser can not refer to itself by its constructor, but must instead call itself by symbol—calling its constructor would otherwise result in unbounded recursion. In order to do so the parser function needs to be bound in the function namespace using setf
. The example below implements a parser for balanced parentheses, and illustrates how to avoid this common caveat.
%and (Function)
Syntax:
— Function: %and &rest
parsers
Arguments and Values:
parsers—parsers.
Description:
%and
applies parsers, and matches the last parser only if all previous parsers succeed. If the last parser produces a result value then %and
produces that value as its result value. It can be said that %and
forms the set intersection of parsers.
Examples:
%any (Function)
Syntax:
— Function: %any parser
Arguments and Values:
parser—a parser.
Description:
%any
matches parser in sequence any number of times. If parser produces a result value and matches at least once then %any
produces a list of the values as its result value.
Examples:
%diff (Function)
Syntax:
— Function: %diff parser &rest
not‑parsers
Arguments and Values:
parser—a parser.
not‑parsers—parsers.
Description:
%diff
matches parser only if applying not‑parsers fails. If parser produces a result value then %diff
produces that value as its result value. It can be said that %diff
forms the set difference of parser and the union of not‑parsers.
Examples:
%handler-case (Macro)
Syntax:
— Macro: %handler-case parser &body
clauses
clauses::= {
↓error‑clause}
*
error‑clause::= (
typespec ([
var])
{
declaration}
* {
form}
* parser‑form)
Arguments and Values:
parser—a parser.
typespec—a type specifier.
var—a variable name.
declaration—a declare
expression; not evaluated.
form—a form.
parser‑form—a form that evaluates to a parser.
Description:
%handler-case
executes parser in a dynamic environment where handlers are active as if by handler-case
. If a condition is handled by %handler-case
, parser‑form is evaluated and the resulting parser is applied.
Examples:
See Also:
handler-case
.
%maybe (Function)
Syntax:
— Function: %maybe parser
Arguments and Values:
parser—a parser.
Description:
%maybe
matches parser or nothing all, but always succeeds. If parser matches and produces a result value then %maybe
produces that value as its result value.
Examples:
%or (Function)
Syntax:
— Function: %or &rest
parsers
Arguments and Values:
parsers—parsers.
Description:
%or
attempts to successfully apply parsers, and matches the first succeeding parser, if any. If that parser produces a result value then %or
produces that value as its result value. It can be said that %or
forms the set union of parsers.
Examples:
%restart-case (Macro)
Syntax:
— Macro: %restart-case parser &rest
clauses
clauses::= {
↓restart‑clause}
*
restart‑clause::= (
case‑name ([
lambda‑list])
〚:interactive
interactive‑expression | :report
report‑expression | :test
test‑expression〛 {
declaration}
* {
form}
* parser‑form)
Arguments and Values:
parser—a parser.
case‑name—a symbol or nil
.
lambda‑list—an ordinary lambda list.
interactive‑expression—a symbol or a lambda expression.
report‑expression—a string, a symbol, or a lambda expression.
test‑expression—a symbol or a lambda expression.
declaration—a declare
expression; not evaluated.
form—a form.
parser‑form—a form that evaluates to a parser.
Description:
%restart-case
executes parser in a dynamic environment where restarts are active as if by restart-case
. If control is transferred to a restart‑clause, parser‑form is evaluated and the resulting parser is applied.
Examples:
See Also:
restart-case
.
%some (Function)
Syntax:
— Function: %some parser
Arguments and Values:
parser—a parser.
Description:
%some
matches parser in sequence one or more times. If parser produces a result value then %some
produces a list of the values as its result value.
Examples:
=destructure (Macro)
Syntax:
— Macro: =destructure (
&rest
lambda‑list)
parser &body
forms
Arguments and Values:
lambda‑list—a destructuring lambda list.
parser—a parser.
forms—an implicit progn.
Description:
=destructure
matches parser and destructures its result value as if by destructuring-bind
. The _
(underscore) symbol can occur in lambda‑list any number of times, and is substituted with a fresh, uninterned symbol and declared ignorable
. If parser matches =destructure
evaluates forms and produces the value of the last form as its result value. If no forms are supplied the value of the last, interned variable defined in lambda‑list is produced as the result value instead.
Examples:
Exceptional Situations:
If the result value of parser does not match the destructuring pattern, an error of type program-error
is signaled.
See Also:
destructuring-bind
=element (Function)
Syntax:
— Function: =element <no arguments>
Description:
=element
matches the next element and produces that element it as its result value.
Examples:
=list (Function)
Syntax:
— Function: =list &rest
parsers
Arguments and Values:
parsers—parsers.
Description:
=list
matches parsers in sequence, and produces a list of the result values of parsers as its result value.
Examples:
=subseq (Function)
Syntax:
— Function: =subseq parser
Arguments and Values:
parser—a parser.
Description:
=subseq
matches parser, and produces the subsequence matched by parser as its result value.
Examples:
=transform (Function)
Syntax:
— Function: =transform parser function
Arguments and Values:
parser—a parser.
function—a designator for a function of one argument.
Description:
=transform
matches parser and produces the result of applying function to the result value of parser as its result value.
Examples:
?end (Function)
Syntax:
— Function: ?end <no arguments>
Description:
?end
matches the end of input.
Examples:
?eq (Function)
Syntax:
— Function: ?eq x &optional
parser
Arguments and Values:
x—an object.
parser—a parser. The default is (=element)
.
Description:
?eq
matches parser if its result value is eq
to x.
Examples:
See also:
?satisfies
?fail (Macro)
Syntax:
— Macro: ?fail &body
forms
Arguments and Values:
forms—forms.
Description:
?fail
always fails to match, and evaluates forms when it is applied.
Examples:
?not (Function)
Syntax:
— Function: ?not parser
Arguments and Values:
parser—a parser.
Description:
?not
matches the next element unless parser matches.
Examples:
?satisfies (Function)
Syntax:
— Function: ?satisfies test &optional
parser
Arguments and Values:
test—a designator for a function of one argument that returns a generalized boolean.
parser—a parser. The default is (=element)
.
Description:
?satisfies
matches parser if its result value satisfies the test.
Examples:
?seq (Function)
Syntax:
— Function: ?seq &rest
parsers
Arguments and Values:
parsers—parsers.
Description:
?seq
matches parsers in sequence.
Examples:
?test (Macro)
Syntax:
— Macro: ?test (
test &rest
arguments)
&optional
parser
Arguments and Values:
test—a designator for a function that returns a generalized boolean.
arguments—objects.
parser—a parser. The default is (=element)
.
Description:
?test
matches parser if its result value satisfies the test with arguments as additional arguments to test.
Examples:
Notes:
Exceptional Situations:
If test accepts less arguments than the number of arguments plus one an error of type program-error
is signaled.
See also:
?satisfies
get-input-position (Function)
Syntax:
— Function: get-input-position <no arguments>
→ position
→ position, line, column
Arguments and Values:
position, column—non-negative integers.
line—a positive integer.
Description:
get-input-position
returns the number of elements read from the input so far. Additionally, line and column positions are returned if the input's element type is character
. Lines are counted starting at one while columns are counted starting from zero.
get-input-position
may only be called from within the body of ?fail
, the handlers of %handler-case
or the restarts of %restart-case
.
Exceptional Situations:
If get-input-position
is not evaluated within the dynamic context of ?fail
, %handler-case
or %restart-case
an error of type simple-error
is signaled.
parse (Function)
Syntax:
— Function: parse input‑source parser
→ result, match‑p, end‑p
Arguments and Values:
input-source—an input source.
parser—a parser.
result—an object.
match‑p, end‑p—generalized booleans.
Description:
parse
applies parser to the input and returns the parser’s result value or nil
. Match‑p is true if parser matched the input-source. End‑p is true if parser matched the complete input-source. Input is derived from input-source by using maxpc.input:make-input
.
parse
accepts input sources of type sequence
and stream
out of the box.
See Also:
maxpc.char (Package)
Utility parsers for character inputs.
*whitespace* (Variable)
Initial Value:
(#\Tab #\Newline #\PageUp #\Page #\Return #\ )
Value Type:
a list of characters.
Description:
The value of *whitespace*
is a list of characters considered to be whitespace characters.
=line (Function)
Syntax:
— Function: =line &optional
keep‑newline‑p
Arguments and Values:
keep‑newline‑p—a generalized boolean. The default is false.
Description:
=line
matches zero or more characters in sequence followed by a #\Newline
character or the end of input, and produces the string of characters as its result value. The terminating #\Newline
character is not included in string unless keep‑newline‑p is true.
Examples:
Exceptional Situations:
If an element attempted to be matched is not a character an error of type type-error
is signaled.
?char (Function)
Syntax:
— Function: ?char char &optional
case‑sensitive‑p
Arguments and Values:
char—a character.
case‑sensitive‑p—a generalized boolean. The default is true.
Description:
?char
matches char. ?char
is case sensitive unless case‑sensitive‑p is false.
Exceptional Situations:
If the next element is not a character an error of type type-error
is signaled.
?newline (Function)
Syntax:
— Function: ?newline <no arguments>
Description:
?newline
matches the #\Newline
character.
?string (Function)
Syntax:
— Function: ?string string &optional
case‑sensitive‑p
Arguments and Values:
string—a string.
case‑sensitive‑p—a generalized boolean. The default is true.
Description:
?string
matches the characters in string in sequence. ?string
is case sensitive unless case‑sensitive‑p is false.
Exceptional Situations:
If an element attempted to be matched is not a character an error of type type-error
is signaled.
?whitespace (Function)
Syntax:
— Function: ?whitespace <no arguments>
Description:
?whitespace
matches an element that is a member of *whitespace*
.
Exceptional Situations:
If the next element is not a character an error of type type-error
is signaled.
maxpc.digit (Package)
Parsers for digit numerals in character inputs.
=integer-number (Function)
Syntax:
— Function: =integer-number &optional
radix
Arguments and Values:
radix—a number of type (integer 2 36)
. The default is 10
.
Description:
=integer-number
matches one or more digit characters in the specified radix, optionally lead by a sign character, in sequence, and produces the integer represented by that sequence as its result value. The leading sign can be #\+
and #\-
for positive and negative values respectively. The default is a positive value.
Examples:
Exceptional Situations:
If an element attempted to be matched is not a character an error of type type-error
is signaled.
=natural-number (Function)
Syntax:
— Function: =natural-number &optional
radix
Arguments and Values:
radix—a number of type (integer 2 36)
. The default is 10
.
Description:
=natural-number
matches one or more digit characters in the specified radix in sequence, and produces the natural number represented by that digit sequence as its result value.
Examples:
Exceptional Situations:
If an element attempted to be matched is not a character an error of type type-error
is signaled.
?digit (Function)
Syntax:
— Function: ?digit &optional
radix
Arguments and Values:
radix—a number of type (integer 2 36)
. The default is 10
.
Description:
?digit
matches a single digit character in the specified radix.
Exceptional Situations:
If the next element is not a character an error of type type-error
is signaled.
maxpc.input (Package)
The generic input interface allows extensions to parse other types of input sources. To add a new input source type, make-input
has to be specialized on that type. The following methods have to be defined for inputs:
input-empty-p
input-first
input-rest
The following methods can optionally be defined for inputs:
input-position
input-element-type
input-sequence
input-element-type (Generic Function)
Syntax:
— Generic Function: input-element-type input
→ typespec
Arguments and Values:
input—an input.
typespec—a type specifier.
Description:
input-element-type
returns a type specifier that designates the type of the elements in input.
input-empty-p (Generic Function)
Syntax:
— Generic Function: input-empty-p input
→ empty-p
Arguments and Values:
input—an input.
empty-p—a generalized boolean.
Description:
input-empty-p
returns true if input is empty.
input-first (Generic Function)
Syntax:
— Generic Function: input-first input
→ element
Arguments and Values:
input—a non-empty input.
element—an object of the type designated by the type specifier returned by input-element-type
when called on input.
Description:
input-first
returns the first element in input.
Exceptional Situations:
If input is empty the behavior of input-first
is unspecified.
input-position (Generic Function)
Syntax:
— Generic Function: input-position input
→ position
Arguments and Values:
input—an input.
position—an integer between 0 and array-dimension-limit
inclusively.
Description:
input-position
returns the position of input.
input-rest (Generic Function)
Syntax:
— Generic Function: input-rest input
→ rest
Arguments and Values:
input—a non-empty input.
rest—the remaining input.
Description:
input-rest
returns the remaining input without the first element.
Exceptional Situations:
If input is empty the behavior of input-rest
is unspecified.
input-sequence (Generic Function)
Syntax:
— Generic Function: input-sequence input length
→ sequence
Arguments and Values:
input—an input.
length—an integer between 0 and array-dimension-limit
inclusively.
sequence—a sequence.
Description:
input-sequence
returns a sequence of the next length elements in input.
Exceptional Situations:
If the number of elements in input are less than length the behavior of input-sequence
is unspecified.
make-input (Generic Function)
Syntax:
— Generic Function: make-input input‑source
Arguments and Values:
input-source—an object.
Description:
make-input
returns an input for input-source.
maxpc.input.stream (Package)
Implements support for input sources of type stream
. Input from streams is copied into a temporary buffer lazily as required by the parser. Streams of type file-stream
are read in as chunks of customizable size.
*bound* (Variable)
Initial Value:
NIL
Description:
*bound*
can be set to limit the number of elements read from stream inputs in a single call to to parse
.
*chunk-size* (Variable)
Initial Value:
1000000
Description:
*chunk-size*
controls the size by which the buffer used for stream inputs grows, and the number of elements read at a time when parsing from streams of type file-stream
.
*element-type* (Variable)
Initial Value:
NIL
Description:
*element-type*
can be set to enforce a specific stream element type when reading from stream inputs. This can be useful when dealing with bivalent streams.