Character Categories - ANSI Common Lisp

Next: Identity of Characters, Previous: Character Attributes, Up: Character Concepts

13.1.4 Character Categories

There are several (overlapping) categories of characters that have no formally associated type but that are nevertheless useful to name. They include graphic characters, alphabetic₁ characters, characters with case (uppercase and lowercase characters), numeric characters, alphanumeric characters, and digits (in a given radix).

For each implementation-defined attribute of a character, the documentation for that implementation must specify whether characters that differ only in that attribute are permitted to differ in whether are not they are members of one of the aforementioned categories.

Note that these terms are defined independently of any special syntax which might have been enabled in the current readtable.

13.1.4.1 Graphic Characters

Characters that are classified as graphic, or displayable, are each associated with a glyph, a visual representation of the character.

A graphic character is one that has a standard textual representation as a single glyph, such as A or * or =. Space, which effectively has a blank glyph, is defined to be a graphic.

Of the standard characters, newline is non-graphic and all others are graphic; see Section 2.1.3 (Standard Characters).

Characters that are not graphic are called non-graphic. Non-graphic characters are sometimes informally called “formatting characters” or “control characters.”

#\Backspace, #\Tab, #\Rubout, #\Linefeed, #\Return, and #\Page, if they are supported by the implementation, are non-graphic.

13.1.4.2 Alphabetic Characters

The alphabetic₁ characters are a subset of the graphic characters. Of the standard characters, only these are the alphabetic₁ characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

a b c d e f g h i j k l m n o p q r s t u v w x y z

Any implementation-defined character that has case must be alphabetic₁. For each implementation-defined graphic character that has no case, it is implementation-defined whether that character is alphabetic₁.

13.1.4.3 Characters With Case

The characters with case are a subset of the alphabetic₁ characters. A character with case has the property of being either uppercase or lowercase. Every character with case is in one-to-one correspondence with some other character with the opposite case.

13.1.4.3.1 Uppercase Characters

An uppercase character is one that has a corresponding lowercase character that is different (and can be obtained using char-downcase).

Of the standard characters, only these are uppercase characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

13.1.4.3.2 Lowercase Characters

A lowercase character is one that has a corresponding uppercase character that is different (and can be obtained using char-upcase).

Of the standard characters, only these are lowercase characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z

13.1.4.3.3 Corresponding Characters in the Other Case

The uppercase standard characters A through Z mentioned above respectively correspond to the lowercase standard characters a through z mentioned above. For example, the uppercase character E corresponds to the lowercase character e, and vice versa.

13.1.4.3.4 Case of Implementation-Defined Characters

An implementation may define that other implementation-defined graphic characters have case. Such definitions must always be done in pairs—one uppercase character in one-to-one correspondence with one lowercase character.

13.1.4.4 Numeric Characters

The numeric characters are a subset of the graphic characters. Of the standard characters, only these are numeric characters:

0 1 2 3 4 5 6 7 8 9

For each implementation-defined graphic character that has no case, the implementation must define whether or not it is a numeric character.

13.1.4.5 Alphanumeric Characters

The set of alphanumeric characters is the union of the set of alphabetic₁ characters and the set of numeric characters.

13.1.4.6 Digits in a Radix

What qualifies as a digit depends on the radix (an integer between 2 and 36, inclusive). The potential digits are:

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Their respective weights are 0, 1, 2, ... 35. In any given radix n, only the first n potential digits are considered to be digits. For example, the digits in radix 2 are 0 and 1, the digits in radix 10 are 0 through 9, and the digits in radix 16 are 0 through F.

Case is not significant in digits; for example, in radix 16, both F and f are digits with weight 15.