Character Syntax Types - ANSI Common Lisp

The Lisp reader constructs an object from the input text by interpreting each character according to its syntax type. The Lisp reader cannot accept as input everything that the Lisp printer produces, and the Lisp reader has features that are not used by the Lisp printer. The Lisp reader can be used as a lexical analyzer for a more general user-written parser.

When the Lisp reader is invoked, it reads a single character from the input stream and dispatches according to the syntax type of that character. Every character that can appear in the input stream is of one of the syntax types shown in Figure 2.6.

constituent macro character single escape
invalid multiple escape whitespace₂

Figure 2.6: Possible Character Syntax Types

The syntax type of a character in a readtable determines how that character is interpreted by the Lisp reader while that readtable is the current readtable. At any given time, every character has exactly one syntax type.

character syntax type character syntax type
Backspace constituent 0–9 constituent
Tab whitespace₂ : constituent
Newline whitespace₂ ; terminating macro char
Linefeed whitespace₂ < constituent
Page whitespace₂ = constituent
Return whitespace₂ > constituent
Space whitespace₂ ? constituent
! constituent @ constituent
" terminating macro char A–Z constituent
# non-terminating macro char [ constituent
$ constituent \ single escape
% constituent ] constituent
& constituent ^ constituent
' terminating macro char _ constituent
( terminating macro char ` terminating macro char
) terminating macro char a–z constituent
* constituent { constituent
+ constituent | multiple escape
, terminating macro char } constituent
- constituent ~ constituent
. constituent Rubout constituent
/ constituent

Figure 2.7: Character Syntax Types in Standard Syntax

The characters marked with an asterisk (*) are initially constituents, but they are not used in any standard Common Lisp notations. These characters are explicitly reserved to the programmer. ~ is not used in Common Lisp, and reserved to implementors. $ and % are alphabetic₂ characters, but are not used in the names of any standard Common Lisp defined names.

Whitespace₂ characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions. Macro characters are divided into two kinds, terminating and non-terminating, depending on whether or not they terminate a token. The following are descriptions of each kind of syntax type.

2.1.4.1 Constituent Characters

Constituent characters are used in tokens. A token is a representation of a number or a symbol. Examples of constituent characters are letters and digits.

Letters in symbol names are sometimes converted to letters in the opposite case when the name is read; see Section 23.1.2 (Effect of Readtable Case on the Lisp Reader). Case conversion can be suppressed by the use of single escape or multiple escape characters.

2.1.4.2 Constituent Traits

Every character has one or more constituent traits that define how the character is to be interpreted by the Lisp reader when the character is a constituent character. These constituent traits are alphabetic₂, digit, package marker, plus sign, minus sign, dot, decimal point, ratio marker, exponent marker, and invalid. Figure 2.8 shows the constituent traits of the standard characters and of certain semi-standard characters; no mechanism is provided for changing the constituent trait of a character. Any character with the alphadigit constituent trait in that figure is a digit if the current input base is greater than that character's digit value, otherwise the character is alphabetic₂. Any character quoted by a single escape is treated as an alphabetic₂ constituent, regardless of its normal syntax.

constituent traits constituent traits
characters characters
Backspace invalid { alphabetic₂

Tab invalid* } alphabetic₂

Newline invalid* + alphabetic₂, plus sign
Linefeed invalid* - alphabetic₂, minus sign
Page invalid* . alphabetic₂, dot, decimal point
Return invalid* / alphabetic₂, ratio marker
Space invalid* A, a alphadigit
! alphabetic₂ B, b alphadigit
" alphabetic₂* C, c alphadigit
# alphabetic₂* D, d alphadigit, double-float exponent marker
$ alphabetic₂ E, e alphadigit, float exponent marker
% alphabetic₂ F, f alphadigit, single-float exponent marker
& alphabetic₂ G, g alphadigit
' alphabetic₂* H, h alphadigit
( alphabetic₂* I, i alphadigit
) alphabetic₂* J, j alphadigit
* alphabetic₂ K, k alphadigit
, alphabetic₂* L, l alphadigit, long-float exponent marker
0-9 alphadigit M, m alphadigit
: package marker N, n alphadigit
; alphabetic₂* O, o alphadigit
< alphabetic₂ P, p alphadigit
= alphabetic₂ Q, q alphadigit
> alphabetic₂ R, r alphadigit
? alphabetic₂ S, s alphadigit, short-float exponent marker
@ alphabetic₂ T, t alphadigit
[ alphabetic₂ U, u alphadigit
\ alphabetic₂* V, v alphadigit
] alphabetic₂ W, w alphadigit
^ alphabetic₂ X, x alphadigit
_ alphabetic₂ Y, y alphadigit
` alphabetic₂* Z, z alphadigit
| alphabetic₂* Rubout invalid
~ alphabetic₂

Figure 2.8: Constituent Traits of Standard Characters and Semi-Standard Characters

The interpretations in this table apply only to characters whose syntax type is constituent. Entries marked with an asterisk (*) are normally shadowed₂

because the indicated characters are of syntax type whitespace₂, macro character, single escape, or multiple escape; these constituent traits apply to them only if their syntax types are changed to constituent.

2.1.4.3 Invalid Characters

Characters with the constituent trait invalid cannot ever appear in a token except under the control of a single escape character. If an invalid character is encountered while an object is being read, an error of type reader-error is signaled. If an invalid character is preceded by a single escape character, it is treated as an alphabetic₂ constituent instead.

2.1.4.4 Macro Characters

When the Lisp reader encounters a macro character on an input stream, special parsing of subsequent characters on the input stream is performed.

A macro character has an associated function called a reader macro function that implements its specialized parsing behavior. An association of this kind can be established or modified under control of a conforming program by using the functions set-macro-character and set-dispatch-macro-character.

Upon encountering a macro character, the Lisp reader calls its reader macro function, which parses one specially formatted object from the input stream. The function either returns the parsed object, or else it returns no values to indicate that the characters scanned by the function are being ignored (e.g., in the case of a comment). Examples of macro characters are backquote, single-quote, left-parenthesis, and right-parenthesis.

A macro character is either terminating or non-terminating. The difference between terminating and non-terminating macro characters lies in what happens when such characters occur in the middle of a token. If a non-terminating macro character occurs in the middle of a token, the function associated with the non-terminating macro character is not called, and the non-terminating macro character does not terminate the token's name; it becomes part of the name as if the macro character were really a constituent character. A terminating macro character terminates any token, and its associated reader macro function is called no matter where the character appears. The only non-terminating macro character in standard syntax is sharpsign.

If a character is a dispatching macro character C₁, its reader macro function is a function supplied by the implementation. This function reads decimal digit characters until a non-digit C₂ is read. If any digits were read, they are converted into a corresponding integer infix parameter P; otherwise, the infix parameter P is nil. The terminating non-digit C₂ is a character (sometimes called a “sub-character” to emphasize its subordinate role in the dispatching) that is looked up in the dispatch table associated with the dispatching macro character C₁. The reader macro function associated with the sub-character C₂

is invoked with three arguments: the stream, the sub-character C₂, and the infix parameter P. For more information about dispatch characters, see the function set-dispatch-macro-character.

2.1.4.5 Multiple Escape Characters

A pair of multiple escape characters is used to indicate that an enclosed sequence of characters, including possible macro characters and whitespace₂ characters, are to be treated as alphabetic₂ characters with case preserved. Any single escape and multiple escape characters that are to appear in the sequence must be preceded by a single escape character.

2.1.4.5.1 Examples of Multiple Escape Characters

2.1.4.6 Single Escape Character

A single escape is used to indicate that the next character is to be treated as an alphabetic₂ character with its case preserved, no matter what the character is or which constituent traits it has.

2.1.4 Character Syntax Types