FPLIB - MOTOROLA DSP56000/1 FLOATING POINT SOFTWARE SUBROUTINE LIBRARY FORMAT DEFINITION - VERSION 2.0 Revision 1.0 August 29, 1986 Revision 1.1 March 25, 1987 Revision 2.0 October 5, 1987 INTRODUCTION FPLIB is a useful set of floating point arithmetic subroutines for the Motorola DSP56000/1 digital signal processor. This HELP file defines the storage format and arithmetic representation used by the DSP56000/1 floating point software subroutine library (FPLIB). The handling of exception cases is also discussed. Subroutine calling conventions are not discussed here but are given in the FPCALLS.HLP file. FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT FORMAT Floating point number - (m,e) including mantissa sign Decimal value = m * ( 2 ** ( e - ebias )) 23_____________________0 23_______________________0 | s . m | | 0 e . | |______________________| |________________________| m = 24 bit mantissa (two's complement, normalized fraction) 23 bit mantissa precision plus 1 bit mantissa sign gives precision of approximately 7 decimal digits. The 24 bit mantissa was chosen to maximize precision with efficient use of the DSP56000 MPY and MAC instructions. A hidden leading 1 is not implemented in this format. Binary encoding: s.xxxxxxx xxxxxxxx xxxxxxxx Bit weight: 0 -1 -23 -2 2 2 Largest positive mantissa $7FFFFF = +0.99999988079071044921875 Smallest positive mantissa $400000 = +0.5 Floating point zero mantissa $000000 = 0 Smallest negative mantissa $BFFFFF = -0.50000011920928955078125 Largest negative mantissa $800000 = -1.0 Reserved mantissas $000001 through $3FFFFF $C00000 through $FFFFFF Note that all reserved mantissas are illegal since they represent denormalized mantissas. e = 14 bit exponent (unsigned integer, biased by ebias = +8191) Stored as a 24 bit unsigned integer with 10 leading zeros. Exponent arithmetic is generally done with 16 bit precision. The 14 bit exponent format was chosen to maximize dynamic range with efficient detection of exponent overflow and exponent underflow. Binary encoding: 00000000 00xxxxxx xxxxxxxx. Bit weight: 13 0 2 2 Largest exponent $003FFF = 2 ** +8192 Assumed fixed point exponent $001FFF = 2 ** +0 = +1.0 Smallest exponent $000000 = 2 ** -8191 Reserved exponents $004000 through $FFFFFF 14 If bit weight 2 is set, exponent overflow has occured. 15 If bit weight 2 is set, exponent underflow has occured. Note that no distinct exponents are reserved for plus infinity, minus infinity, Not a Number (IEEE NaN), minus zero or denormalized numbers. FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT NUMBER RANGE Largest positive floating point number - m = $7FFFFF, e = $003FFF Decimal value = +0.1090748 E+2467 Smallest positive floating point number - m = $400000, e = $000000 Decimal value = +0.9168017 E-2466 Smallest negative floating point number - m = $BFFFFF, e = $000000 Decimal value = -0.9168019 E-2466 Largest positive floating point number - m = $800000, e = $003FFF Decimal value = -0.1090748 E+2467 Floating point zero - m = $000000, e = $000000 Decimal value = +0.0 Note that the two's complement mantissa does not have equal positive and negative ranges. Only sign-magnitude formats possess this property. These ranges should be checked after most arithmetic operations. FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT DSP56000/1 REGISTER USAGE Sign Only Mantissa Exponent Usage x1 x0 Input only y1 y0 Input only a2 a1 b1 Input and output r0,n0,m0 Reserved for FPLIB The library subroutines do not preserve the contents of these registers unless specifically noted in the function. Accumulator a usually contains the mantissa upon return from the subroutine. Accumulator b usually contains the exponent upon return from the subroutine. The subroutines assume that the input variables are present in the appropriate registers when the subroutine is called. FPLIB "SINGLE EXTENDED" PRECISION FLOATING POINT DSP56000/1 MEMORY USAGE The floating point mantissa and exponent may be stored in any locations in any memory space. The input and output register values are organized so that the long (L:) addressing mode may be used to load/store both the mantissa and exponent with one instruction. If the long addressing mode is used, the mantissa is in X memory and the exponent is in Y memory at the same address. COMPARISON TO ANSI/IEEE STD 754-1985 STANDARD FOR BINARY FLOATING POINT ARITHMETIC Since the IEEE Floating Point Arithmetic Standard is well publicized, it is useful to compare these two floating point formats. This floating point format (FPLIB) differs from the IEEE standard primarily in its handling of floating point exceptions. Other differences are noted in the table below. Conversion between the IEEE standard format and this format is straight-forward. CHARACTERISTIC FPLIB FORMAT IEEE FORMAT -------------- ------------ ----------- Mantissa Precision 23 bits 24 bits Hidden Leading One No Yes Mantissa Format 24 bit Two's 23 bit Unsigned Complement Fraction Magnitude Fraction Exponent Width 16 bits (14 bits 8 bits (single) used) 11 bits (double) Maximum Exponent +8192 +127 (single) +1023 (double) Minimum Exponent -8191 -127 (single) -1022 (double) Exponent Bias +8191 +127 (single) +1023 (double) Format Width 48 bits 32 bits (single) 64 bits (double) Rounding Round to Nearest Round to Nearest Round to +Infinity Round to -Infinity Round to Zero Infinity Arithmetic Saturation Limiting Affine Operations Denormalized Numbers No (Forced to Zero) Yes (With Minimum Exponent) Exceptions Divide by Zero Invalid Operations Overflow Divide by Zero Negative Square Root Overflow Underflow Inexact Arithmetic IEEE SINGLE PRECISION FORMAT _31_30______________23_22______________________0 | s | 8 bit exponent | 23 bit mantissa | |___|__________________|________________________| IEEE DOUBLE PRECISION FORMAT _63_62______________________52_51_______________________________0 | s | 11 bit exponent | 52 bit mantissa | |___|__________________________|_________________________________| As shown in the table, the FPLIB mantissa precision is one bit less than the IEEE single precision format. This is a result of using two's complement arithmetic in the DSP56000/1. The FPLIB exponent width is three bits more than the IEEE double precision format. This provides an extremely large (approx. 100,000 dB) dynamic range which eliminates exponent overflow for most applications. If exponent overflow occurs, the result is limited to the maximum representable floating point number of the correct sign. If exponent underflow occurs, the result is limited to the minimum representable floating point number, which is zero. Although the FPLIB format does not provide the arithmetic safety offered by the IEEE standard, it avoids extensive error checking and exceptions in favor of real-time execution speed and efficient implementation on the DSP56000/1. All exception conditions are handled "in-line" according to predefined rules. This accepts the fact that real-time systems have no choice but to provide an output with some amount of error if an exception occurs. It is not possible to stop execution until the application program determines a solution to the problem and fixes it. One major difference is the use of affine arithmetic in the IEEE standard versus the use of saturation arithmetic in the FPLIB format. Affine arithmetic gives separate identity to plus infinity, minus infinity, plus zero and minus zero. In operations involving these values, finite quantities remain finite and infinite quantities remain infinite. In contrast, this format gives special identity only to unsigned zero. This format performs saturation arithmetic such that any result out of the representable floating point range is replaced with the closest floating point representation. Since the dynamic range of this format is quite large, it is adequate for most applications. In the analog world, overflow is analogous to an analog op amp output clamping at the power supply rails. The IEEE floating point standard provides extensive error handling required by affine arithmetic, denormalized numbers, signaling Not a Number (NaNs) and quiet NaNs. It postpones introducing computation errors by using internal signaling and user traps to process each exception condition. Computational errors will be introduced by the application program if the calculation is completed instead of aborting the program. The FPLIB format introduces computation errors when an exception occurs in order to maintain real-time execution. An error flag (L bit in CCR) is set to inform the application program that an exception has occured. This bit will remain set until reset by the application program. The user can then eliminate the exception by algorithm modifications.