Floating-point Computations - ANSI Common Lisp

Next: Complex Computations, Previous: Rational Computations, Up: Number Concepts

12.1.4 Floating-point Computations

The following rules apply to floating point computations.

12.1.4.1 Rule of Float and Rational Contagion

When rationals and floats are combined by a numerical function, the rational is first converted to a float of the same format. For functions such as + that take more than two arguments, it is permitted that part of the operation be carried out exactly using rationals and the rest be done using floating-point arithmetic.

When rationals and floats are compared by a numerical function, the function rational is effectively called to convert the float to a rational and then an exact comparison is performed. In the case of complex numbers, the real and imaginary parts are effectively handled individually.

12.1.4.1.1 Examples of Rule of Float and Rational Contagion

 ;;;; Combining rationals with floats.
 ;;; This example assumes an implementation in which
 ;;; (float-radix 0.5) is 2 (as in IEEE) or 16 (as in IBM/360),
 ;;; or else some other implementation in which 1/2 has an exact
 ;;;  representation in floating point.
 (+ 1/2 0.5) → 1.0
 (- 1/2 0.5d0) → 0.0d0
 (+ 0.5 -0.5 1/2) → 0.5

 ;;;; Comparing rationals with floats.
 ;;; This example assumes an implementation in which the default float
 ;;; format is IEEE single-float, IEEE double-float, or some other format
 ;;; in which 5/7 is rounded upwards by FLOAT.
 (< 5/7 (float 5/7)) → true
 (< 5/7 (rational (float 5/7))) → true
 (< (float 5/7) (float 5/7)) → false

12.1.4.2 Rule of Float Approximation

Computations with floats are only approximate, although they are described as if the results were mathematically accurate. Two mathematically identical expressions may be computationally different because of errors inherent in the floating-point approximation process. The precision of a float is not necessarily correlated with the accuracy of that number. For instance, 3.142857142857142857 is a more precise approximation to π than 3.14159, but the latter is more accurate. The precision refers to the number of bits retained in the representation. When an operation combines a short float with a long float, the result will be a long float. Common Lisp functions assume that the accuracy of arguments to them does not exceed their precision. Therefore when two small floats are combined, the result is a small float. Common Lisp functions never convert automatically from a larger size to a smaller one.

12.1.4.3 Rule of Float Underflow and Overflow

An error of type floating-point-overflow or floating-point-underflow should be signaled if a floating-point computation causes exponent overflow or underflow, respectively.

12.1.4.4 Rule of Float Precision Contagion

The result of a numerical function is a float of the largest format among all the floating-point arguments to the function.