;**************************************************************************
;
;   ADPCM.HLP
;
; Help file for ADPCM.ASM - Version 1.0 - 1/31/89
;
; Updated 3/31/89 - Tested with real-time operation
;
;**************************************************************************


This help file is preliminary documentation for the DSP56001 implementation
of the CCITT 32 kbit/s ADPCM speech coding algorithm. Full documentation
will be provided in a complete applications note to be published in the near
future. A version of 32 kbit/s ADPCM that does not adhere to the CCITT
standard will be discussed in the upcoming applications note. It should
provide comparable quality to the standard version and should obtain
significantly better real-time performance but will not pass the CCITT
test vectors.


General Information:

The program ADPCM.ASM implements the algorithm defined in CCITT
Recommendation G.721: "32 kbit/s Adaptive Differential Pulse Code
Modulation" dated August 1986. For complete understanding of the DSP56001
code it is essential to refer to Recommendation G.721. The initial version
of this program is suitable for software testing on a DSP56001 ADS. It
passes all of the mu-law and A-law test sequences as defined in Appendix II
of the above document.


Usage Notes:

This version of the ADPCM algorithm simulates PCM/ADPCM I/O by the file
I/O routines on the DSP56001 ADS board. This is the most convenient method
for verifying the CCITT test vectors. Any file containing PCM speech/data
in ASCII hex characters may be used as input to the encoder. The test
files provided by the CCITT are suitable for this purpose, however the
file I/O routines on the ADS require they be in a different form than
that provided by the CCITT. Each byte of PCM data must be separated by
a space or a return. The best format for the test files is one PCM byte
of data per line. Also the algorithm assumes the 8 bits of PCM data be
in the upper 8 bits of register A1, therefore 4 zeros must be appended
to the PCM bytes in the input file (leading zeros are not needed).

The encoder writes the ADPCM output to an output file in a similar
format. In this case all 24 bits of register A1 are output. The program
puts the 4 bits of ADPCM data into the upper 4 bits of register A1
and sets the other bits to 0. Therefore the output file has 1 hex
character of data followed by 5 zeroes. This is the correct format for
input to the encoder.

For example the CCITT test file VECTOR1.MU containing mu-law PCM data
is distributed in the format:

  ffa719920f9118a34d2b9a138f119620beaf1c941090159e36369e159010941c
  afbe2096118f139b2b4da318910f9219a7fd2799128f119823cdab1a930f9116
  a03e2f9c149010951eb6b61e951090149c2f3ea016910f931babcd2398118f12
  <etc.>

This file was converted to the following format for input to the
encoder:

  ff0000
  a70000
  190000
  920000
  f0000
  910000
  180000
  <etc.>

Similarly the CCITT test file VECTOR2.MU containing mu-law ADPCM
data is distributed in the format:

  0f07080708050c02010c0509050b030d0e030a060a040c02010c0409050b040d
  0e020a050a050b020f0c0409050a030d01020b0509040b020e0d03090409030c
  01010c050a050b040e0e020a0509030b010f0d0409050b040d01020b0509040c
  <etc.>

This file was converted to the following format for input to the
decoder:

  f00000
  700000
  800000
  700000
  800000
  500000
  c00000
  <etc.>

To run data files through the algorithm requires both a PCM input
file and an ADPCM input file. As an example the following sequence
of ADS commands illustrate how to run the test files VECTOR1.MU and
VECTOR2.MU:

  load ADPCM               ;Downloads ADPCM.LOD to the ADS board
  break p:$46f             ;Sets a breakpt at the end of the main loop
  input #1 VECTOR1.MU      ;Assigns VECTOR1.MU as input to the encoder
  input #2 VECTOR2.MU      ;Assigns VECTOR2.MU as input to the decoder
  output #1 OUT1.MU        ;Assigns the encoder output to file OUT1.MU
  output #2 OUT2.MU        ;Assigns the decoder output to file OUT2.MU
  go #1 :16384             ;Runs the code through 16,384 occurances of
                           ; the above breakpt (VECTOR1.MU and VECTOR2.MU
                           ; each contain 16,384 test words)
  <after breakpt is reached>
  output off               ;Closes output files - There should be a
                           ; message indicating 16,384 words were
                           ; transferred to each file

Please note that the mu-law or A-law initialization files should be
run before the actual test files in order to get the correct output
(see Appendix II of Recommendation G.721 for details).
  


Performance specifications:

The program uses only internal X and Y memory so no external data memory
is required. However, approximatly 1250 words of total program memory is
required, so there should be at least 1K of full-speed (no wait state)
external Program RAM in order to accomodate this. The code is designed to
implement one full-duplex channel, both encode (transmit) and decode
(receive), on one DSP56001. To achieve real-time performance a DSP56001
running at 27 MHz is required. A version of this program has been run on
a 27 MHz DSP56001 with real-time speech signals. The only modification
to the test program was to provide real-time I/O. No changes to the
algorithm implementation were required. 

Current worst-case calculations for execution time show that the encoder
portion runs in real-time while the decoder portion is potentially slightly
slower than real-time. The encode algorithm takes 810 instruction cycles
(1 instruction cycle = 2 clock cycles) for the worst case delay while the
decode algorithm takes 897 cycles (not including I/O). I/O at a rate
of 8 kHz per sample gives 125 microseconds to do both an encode and
a decode. This translates into 3333 clock cycles on a DSP56001 running at
27 MHz. This allows for 833 instruction cycles each to do the encode and
the decode algorithms. The typical execution time for simulation data is
less than the worst case maximums given above. In the test of real-time
operation, the overall execution time for both the encoder and decoder
was never observed to be greater than real-time. (Note: The real-time test
INCLUDED the synchronization routines in the decoder)

The following shows the order of execution of the routines and the worst
case processing time (in instruction cycles) for each routine (a routine
defined as the code between commented sections even though some processing
for that function may not be included in this code):

      Encoder                            Decoder

FMULT (x8)     341                 FMULT (x8)     341
ACCUM           13                 ACCUM           13
LIMA             4                 LIMA             4
MIX             14                 MIX             14
EXPAND          10                 RECONST         13
SUBTA            3                 ADDA             3
LOG             22                 ANTILOG         25
SUBTB            3                 TRANS           34
QUAN            36                 ADDB             8
RECONST          7                 ADDC            19
ADDA             3                 XOR
ANTILOG         25                 UPB (x8)        76
TRANS           34                 UPA2            27
ADDB             8                 LIMC             6
ADDC            19                 UPA1            12
XOR                                LIMD             8
UPB (x8)        76                 FLOATA          22
UPA2            27                 FLOATB          27
LIMC             6                 TONE             5
UPA1            12                 TRIGB           13
LIMD             8                 FUNCTF          14
FLOATA          22                 FILTA            6
FLOATB          27                 FILTB            5
TONE             5                 SUBTC           11
TRIGB           13                 FILTC            5
FUNCTF          14                 TRIGA            3
FILTA            6                 FUNCTW           7
FILTB            5                 FILTD            5
SUBTC           11                 LIMB             4
FILTC            5                 FILTE            9
TRIGA            3                 COMPRESS        33
FUNCTW           7                 EXPAND          10
FILTD            5                 SUBTA            3
LIMB             4                 LOG             22
FILTE            8                 SUBTB            3
misc.            4                 SYNC            80
                                   misc.            7
-------------------                -------------------
TOTAL          810  Icycles        TOTAL          897  Icycles


Note: In the decoder the routines EXPAND, SUBTA, LOG, SUBTB, and SYNC
      are not necessary for the ADPCM algorithm. They are included for
      synchronization of multiple PCM/ADPCM/PCM conversions on a single
      channel. If only one PCM/ADPCM/PCM conversion is used the deletion
      of these routines should not affect the output speech quality.
      The worst-case decoder execution time will be 779 instruction
      cycles without these routines which will meet real-time constraints
      at 27 MHz. Please note that these routines ARE neccesary to pass
      the CCITT test files.


Terminology used in the DSP56001 code:

Most arithmetic symbols adhere to those in the G.721 specification and are
generally obvious with the possible exception of

    ** = power of  - i.e. 2**(-4) = 2 to the power of -4

The following symbols correspond to the variable types defined in the
G.721 specification:

    SM = signed magnitude value
    TC = two's complement value
    FL = floating point value

A number preceding one of these symbols shows the number of total bits
in a particular variable. The full binary representation of a variable,
including the location of the radix point, is given in Table 3 of Recom-
mendation G.721.

For additional information to aid in understanding the code, the contents
of registers or memory locations at certain points are detailed bit for
bit. The terminology used is shown below. 

    . = location of implied radix point
    i = integer bit
    f = fraction bit
    s = sign bit
    m = mantissa bit
    e = exponent bit
    1 = bit is always 1
    0 = bit is always 0
    X = bit value is unknown
        but is not significant

    An exception is the PCM word where
        p = sign bit
        s = segment bit
        q = quantization level bit

Example:

;   Y = 0iii i.fff | ffff ff00 | 0000 0000  (13SM)
Y_T         DS      1               ;Quantizer scale factor

    This shows of variable definition of the scale factor Y. It is
    defined as a 13-bit signed magnitude number with 4 integer bits
    and 9 fractional bits. The value stored in memory is always
    stored in the 24-bit format shown above with the implied radix
    point between bits 18 and 19.

Example:

;   A1 = 01mm mmm0 | 0000 0000 | 0000 0000 (A2=A0=0)
;   B1 = 0000 0000 | 0000 0000 | 0000 eeee (B2=B0=0)

    At the point where these comments appear in the code the register
    A1 always contains a 1 in bit 22, 5 other mantissa bits, and all
    other bits set to 0. Register B1 always contains 4 exponent bits
    in bits 0 through 3 with all other bits set to 0. Registers A0,A2,
    B0, and B2 are always set to 0.


Implementation notes:

This code is designed to be used as a stand-alone module. For users that
wish to run the standard CCITT ADPCM algorithm there should be no need to
modify the DSP56001 code that implements the standard (only the file I/O
section needs to be modified). The comments in the code are provided for
those who may wish to modify the algorithm itself for various reasons.
As noted above it is essential to refer to Recommendation G.721 to under-
stand the DSP56001 code. For those that may wish to modify the DSP56001
code the following may make the code somewhat easier to understand.

The code for this algorithm is written to obtain optimal speed (due to real-
time constraints). In almost all cases an attempt is made to implement a
given function in the most efficient manner possible on the DSP56001. The
primary goal is to make the worst case execution time as short as possible.
The secondary goal is to conserve program/data memory. This goal of overall
efficiency is the reason that most variables are NOT right justified as
implied in the G.721 specification. Since the DSP56001 uses fractional-based
arithmetic (i.e. left justified) keeping data as close to left justified as
possible results in a more efficient implementation. It does however make
the code more difficult to understand. Extra comments showing the contents
of registers at various points in the code have been added to help clarify
especially complex operations.

The primary goal of execution speed is why code that is the same for both
the encoder and the decoder is not made into subroutines. It was found that
there was too much overhead involved in calling these subroutines to allow
for real-time full-duplex performance. Not having subroutines results in
the code requiring more program memory, but allows the code to run much
faster. Even if the encoder and decoder shared routines the program would
still require external program memory so additional high-speed RAM would
still be required.

Also in an effort to improve efficiency, the parallel nature of the DSP56001
is taken advantage of whenever possible. This causes the code to "run together"
in many cases, such as getting data for the next section of code before the
current section has been completed. This does however make the execution time
of a particular routine more difficult to define exactly. 

A more detailed explanation of the DSP56001 code will be provided in the
upcoming applications note, however the comments in the actual code will
still provide the best aid in understanding the code.

As noted above, a version of this program has been tested with real-time
speech signals. The I/O section was modified to transmit and receive PCM
data (64 kbits/s) to/from a CODEC connected to the SSI port of the DSP56001.
For test purposes the received data from the CODEC was run through the
encoder routine, sent directly to the decoder routine, and the decoded PCM
data sent out to the same CODEC. The modification to talk to the CODEC was
relatively simple and no modifications were made to the algorithm implementa-
tion. The operation of the test set-up with sample speech signals was performed
and the output quality was found to be comparable to standard PCM encoding/
decoding. Exhaustive tests with many types of voice-band telephone signals
have not been performed, but since the code passes all of the CCITT test
vectors the user may have reasonable confidence in the performance of this
code.

