FPLIB - MOTOROLA DSP56000/1 FLOATING POINT SOFTWARE SUBROUTINE LIBRARY

                FORMAT DEFINITION - VERSION 2.0

Revision 1.0    August 29, 1986
Revision 1.1    March  25, 1987
Revision 2.0    October 5, 1987

INTRODUCTION

FPLIB is a useful set of floating point arithmetic subroutines for the
Motorola DSP56000/1 digital signal processor.  This HELP file defines
the storage format and arithmetic representation used by the DSP56000/1
floating point software subroutine library (FPLIB).  The handling of
exception cases is also discussed.  Subroutine calling conventions are
not discussed here but are given in the FPCALLS.HLP file.

FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT FORMAT

Floating point number - (m,e) including mantissa sign
        Decimal value = m * ( 2 ** ( e - ebias ))

        23_____________________0    23_______________________0
        | s .      m           |    |    0            e .    |
        |______________________|    |________________________|

m = 24 bit mantissa (two's complement, normalized fraction)
        23 bit mantissa precision plus 1 bit mantissa sign gives
        precision of approximately 7 decimal digits.  The 24 bit
        mantissa was chosen to maximize precision with efficient
        use of the DSP56000 MPY and MAC instructions.
        A hidden leading 1 is not implemented in this format.

        Binary encoding:  s.xxxxxxx xxxxxxxx xxxxxxxx
        Bit weight:       0  -1                      -23
                        -2  2                       2

        Largest positive mantissa    $7FFFFF = +0.99999988079071044921875
        Smallest positive mantissa   $400000 = +0.5
        Floating point zero mantissa $000000 = 0
        Smallest negative mantissa   $BFFFFF = -0.50000011920928955078125
        Largest negative mantissa    $800000 = -1.0
        Reserved mantissas           $000001 through $3FFFFF
                                     $C00000 through $FFFFFF

        Note that all reserved mantissas are illegal since they
        represent denormalized mantissas.

e = 14 bit exponent (unsigned integer, biased by ebias = +8191)
        Stored as a 24 bit unsigned integer with 10 leading zeros.
        Exponent arithmetic is generally done with 16 bit precision.
        The 14 bit exponent format was chosen to maximize dynamic
        range with efficient detection of exponent overflow and
        exponent underflow.

        Binary encoding:  00000000 00xxxxxx xxxxxxxx.
        Bit weight:                   13            0
                                     2             2

        Largest exponent             $003FFF = 2 ** +8192
        Assumed fixed point exponent $001FFF = 2 ** +0 = +1.0
        Smallest exponent            $000000 = 2 ** -8191
        Reserved exponents           $004000 through $FFFFFF

                        14
        If bit weight  2   is set, exponent overflow has occured.
                        15
        If bit weight  2   is set, exponent underflow has occured.

        Note that no distinct exponents are reserved for plus infinity,
        minus infinity, Not a Number (IEEE NaN), minus zero or
        denormalized numbers.


FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT NUMBER RANGE

Largest positive floating point number -  m = $7FFFFF, e = $003FFF
        Decimal value = +0.1090748 E+2467

Smallest positive floating point number - m = $400000, e = $000000
        Decimal value = +0.9168017 E-2466

Smallest negative floating point number - m = $BFFFFF, e = $000000
        Decimal value = -0.9168019 E-2466

Largest positive floating point number -  m = $800000, e = $003FFF
        Decimal value = -0.1090748 E+2467

Floating point zero -                     m = $000000, e = $000000
        Decimal value = +0.0

Note that the two's complement mantissa does not have equal
positive and negative ranges.  Only sign-magnitude formats
possess this property.  These ranges should be checked after
most arithmetic operations.


FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT DSP56000/1 REGISTER USAGE

Sign Only       Mantissa        Exponent        Usage
                x1              x0              Input only
                y1              y0              Input only
a2              a1              b1              Input and output
                                r0,n0,m0        Reserved for FPLIB

The library subroutines do not preserve the contents of these registers
unless specifically noted in the function.  Accumulator a usually
contains the mantissa upon return from the subroutine.  Accumulator b
usually contains the exponent upon return from the subroutine.
The subroutines assume that the input variables are present in the
appropriate registers when the subroutine is called.


FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT DSP56000/1 MEMORY USAGE

The floating point mantissa and exponent may be stored in any
locations in any memory space.  The input and output register
values are organized so that the long (L:) addressing mode may
be used to load/store both the mantissa and exponent with one
instruction.  If the long addressing mode is used, the mantissa
is in X memory and the exponent is in Y memory at the same address.


COMPARISON TO ANSI/IEEE STD 754-1985 STANDARD FOR BINARY FLOATING
POINT ARITHMETIC

Since the IEEE Floating Point Arithmetic Standard is well
publicized, it is useful to compare these two floating point
formats.  This floating point format (FPLIB) differs from the
IEEE standard primarily in its handling of floating point exceptions.
Other differences are noted in the table below.  Conversion between
the IEEE standard format and this format is straight-forward.

CHARACTERISTIC          FPLIB FORMAT            IEEE FORMAT
--------------          ------------            -----------

Mantissa Precision      23 bits                 24 bits

Hidden Leading One      No                      Yes

Mantissa Format         24 bit Two's            23 bit Unsigned
                        Complement Fraction     Magnitude Fraction

Exponent Width          16 bits (14 bits        8 bits (single)
                        used)                   11 bits (double)

Maximum Exponent        +8192                   +127 (single)
                                                +1023 (double)

Minimum Exponent        -8191                   -127 (single)
                                                -1022 (double)

Exponent Bias           +8191                   +127 (single)
                                                +1023 (double)

Format Width            48 bits                 32 bits (single)
                                                64 bits (double)

Rounding                Round to Nearest        Round to Nearest
                                                Round to +Infinity
                                                Round to -Infinity
                                                Round to Zero

Infinity Arithmetic     Saturation Limiting     Affine Operations

Denormalized Numbers    No (Forced to Zero)     Yes (With Minimum Exponent)

Exceptions              Divide by Zero          Invalid Operations
                        Overflow                Divide by Zero
                        Negative Square Root    Overflow
                                                Underflow
                                                Inexact Arithmetic


IEEE SINGLE PRECISION FORMAT

 _31_30______________23_22______________________0
| s |  8 bit exponent  |     23 bit mantissa    | 
|___|__________________|________________________|


IEEE DOUBLE PRECISION FORMAT

 _63_62______________________52_51_______________________________0
| s |      11 bit exponent     |         52 bit mantissa         | 
|___|__________________________|_________________________________|


As shown in the table, the FPLIB mantissa precision is one bit less
than the IEEE single precision format.  This is a result of using
two's complement arithmetic in the DSP56000/1.  The FPLIB exponent
width is three bits more than the IEEE double precision format.
This provides an extremely large (approx. 100,000 dB) dynamic
range which eliminates exponent overflow for most applications.
If exponent overflow occurs, the result is limited to the maximum
representable floating point number of the correct sign.  If
exponent underflow occurs, the result is limited to the minimum
representable floating point number, which is zero.  Although
the FPLIB format does not provide the arithmetic safety offered
by the IEEE standard, it avoids extensive error checking and
exceptions in favor of real-time execution speed and efficient
implementation on the DSP56000/1.  All exception conditions are
handled "in-line" according to predefined rules.  This accepts
the fact that real-time systems have no choice but to provide an
output with some amount of error if an exception occurs.  It is
not possible to stop execution until the application program
determines a solution to the problem and fixes it.

One major difference is the use of affine arithmetic in the IEEE
standard versus the use of saturation arithmetic in the FPLIB format.
Affine arithmetic gives separate identity to plus infinity, minus
infinity, plus zero and minus zero.  In operations involving these
values, finite quantities remain finite and infinite quantities
remain infinite.  In contrast, this format gives special identity
only to unsigned zero.  This format performs saturation arithmetic
such that any result out of the representable floating point range
is replaced with the closest floating point representation.  Since
the dynamic range of this format is quite large, it is adequate for
most applications.  In the analog world, overflow is analogous to an
analog op amp output clamping at the power supply rails.

The IEEE floating point standard provides extensive error handling
required by affine arithmetic, denormalized numbers, signaling Not
a Number (NaNs) and quiet NaNs.  It postpones introducing computation
errors by using internal signaling and user traps to process each
exception condition.  Computational errors will be introduced by
the application program if the calculation is completed instead of
aborting the program.  The FPLIB format introduces computation errors
when an exception occurs in order to maintain real-time execution.
An error flag (L bit in CCR) is set to inform the application program
that an exception has occured.  This bit will remain set until reset
by the application program.  The user can then eliminate the exception
by algorithm modifications.
