;************************************************************************** ; ; ADPCM.HLP ; ; Help file for ADPCM.ASM - Version 1.0 - 1/31/89 ; ; Updated 3/31/89 - Tested with real-time operation ; ;************************************************************************** This help file is preliminary documentation for the DSP56001 implementation of the CCITT 32 kbit/s ADPCM speech coding algorithm. Full documentation will be provided in a complete applications note to be published in the near future. A version of 32 kbit/s ADPCM that does not adhere to the CCITT standard will be discussed in the upcoming applications note. It should provide comparable quality to the standard version and should obtain significantly better real-time performance but will not pass the CCITT test vectors. General Information: The program ADPCM.ASM implements the algorithm defined in CCITT Recommendation G.721: "32 kbit/s Adaptive Differential Pulse Code Modulation" dated August 1986. For complete understanding of the DSP56001 code it is essential to refer to Recommendation G.721. The initial version of this program is suitable for software testing on a DSP56001 ADS. It passes all of the mu-law and A-law test sequences as defined in Appendix II of the above document. Usage Notes: This version of the ADPCM algorithm simulates PCM/ADPCM I/O by the file I/O routines on the DSP56001 ADS board. This is the most convenient method for verifying the CCITT test vectors. Any file containing PCM speech/data in ASCII hex characters may be used as input to the encoder. The test files provided by the CCITT are suitable for this purpose, however the file I/O routines on the ADS require they be in a different form than that provided by the CCITT. Each byte of PCM data must be separated by a space or a return. The best format for the test files is one PCM byte of data per line. Also the algorithm assumes the 8 bits of PCM data be in the upper 8 bits of register A1, therefore 4 zeros must be appended to the PCM bytes in the input file (leading zeros are not needed). The encoder writes the ADPCM output to an output file in a similar format. In this case all 24 bits of register A1 are output. The program puts the 4 bits of ADPCM data into the upper 4 bits of register A1 and sets the other bits to 0. Therefore the output file has 1 hex character of data followed by 5 zeroes. This is the correct format for input to the encoder. For example the CCITT test file VECTOR1.MU containing mu-law PCM data is distributed in the format: ffa719920f9118a34d2b9a138f119620beaf1c941090159e36369e159010941c afbe2096118f139b2b4da318910f9219a7fd2799128f119823cdab1a930f9116 a03e2f9c149010951eb6b61e951090149c2f3ea016910f931babcd2398118f12 This file was converted to the following format for input to the encoder: ff0000 a70000 190000 920000 f0000 910000 180000 Similarly the CCITT test file VECTOR2.MU containing mu-law ADPCM data is distributed in the format: 0f07080708050c02010c0509050b030d0e030a060a040c02010c0409050b040d 0e020a050a050b020f0c0409050a030d01020b0509040b020e0d03090409030c 01010c050a050b040e0e020a0509030b010f0d0409050b040d01020b0509040c This file was converted to the following format for input to the decoder: f00000 700000 800000 700000 800000 500000 c00000 To run data files through the algorithm requires both a PCM input file and an ADPCM input file. As an example the following sequence of ADS commands illustrate how to run the test files VECTOR1.MU and VECTOR2.MU: load ADPCM ;Downloads ADPCM.LOD to the ADS board break p:$46f ;Sets a breakpt at the end of the main loop input #1 VECTOR1.MU ;Assigns VECTOR1.MU as input to the encoder input #2 VECTOR2.MU ;Assigns VECTOR2.MU as input to the decoder output #1 OUT1.MU ;Assigns the encoder output to file OUT1.MU output #2 OUT2.MU ;Assigns the decoder output to file OUT2.MU go #1 :16384 ;Runs the code through 16,384 occurances of ; the above breakpt (VECTOR1.MU and VECTOR2.MU ; each contain 16,384 test words) output off ;Closes output files - There should be a ; message indicating 16,384 words were ; transferred to each file Please note that the mu-law or A-law initialization files should be run before the actual test files in order to get the correct output (see Appendix II of Recommendation G.721 for details). Performance specifications: The program uses only internal X and Y memory so no external data memory is required. However, approximatly 1250 words of total program memory is required, so there should be at least 1K of full-speed (no wait state) external Program RAM in order to accomodate this. The code is designed to implement one full-duplex channel, both encode (transmit) and decode (receive), on one DSP56001. To achieve real-time performance a DSP56001 running at 27 MHz is required. A version of this program has been run on a 27 MHz DSP56001 with real-time speech signals. The only modification to the test program was to provide real-time I/O. No changes to the algorithm implementation were required. Current worst-case calculations for execution time show that the encoder portion runs in real-time while the decoder portion is potentially slightly slower than real-time. The encode algorithm takes 810 instruction cycles (1 instruction cycle = 2 clock cycles) for the worst case delay while the decode algorithm takes 897 cycles (not including I/O). I/O at a rate of 8 kHz per sample gives 125 microseconds to do both an encode and a decode. This translates into 3333 clock cycles on a DSP56001 running at 27 MHz. This allows for 833 instruction cycles each to do the encode and the decode algorithms. The typical execution time for simulation data is less than the worst case maximums given above. In the test of real-time operation, the overall execution time for both the encoder and decoder was never observed to be greater than real-time. (Note: The real-time test INCLUDED the synchronization routines in the decoder) The following shows the order of execution of the routines and the worst case processing time (in instruction cycles) for each routine (a routine defined as the code between commented sections even though some processing for that function may not be included in this code): Encoder Decoder FMULT (x8) 341 FMULT (x8) 341 ACCUM 13 ACCUM 13 LIMA 4 LIMA 4 MIX 14 MIX 14 EXPAND 10 RECONST 13 SUBTA 3 ADDA 3 LOG 22 ANTILOG 25 SUBTB 3 TRANS 34 QUAN 36 ADDB 8 RECONST 7 ADDC 19 ADDA 3 XOR ANTILOG 25 UPB (x8) 76 TRANS 34 UPA2 27 ADDB 8 LIMC 6 ADDC 19 UPA1 12 XOR LIMD 8 UPB (x8) 76 FLOATA 22 UPA2 27 FLOATB 27 LIMC 6 TONE 5 UPA1 12 TRIGB 13 LIMD 8 FUNCTF 14 FLOATA 22 FILTA 6 FLOATB 27 FILTB 5 TONE 5 SUBTC 11 TRIGB 13 FILTC 5 FUNCTF 14 TRIGA 3 FILTA 6 FUNCTW 7 FILTB 5 FILTD 5 SUBTC 11 LIMB 4 FILTC 5 FILTE 9 TRIGA 3 COMPRESS 33 FUNCTW 7 EXPAND 10 FILTD 5 SUBTA 3 LIMB 4 LOG 22 FILTE 8 SUBTB 3 misc. 4 SYNC 80 misc. 7 ------------------- ------------------- TOTAL 810 Icycles TOTAL 897 Icycles Note: In the decoder the routines EXPAND, SUBTA, LOG, SUBTB, and SYNC are not necessary for the ADPCM algorithm. They are included for synchronization of multiple PCM/ADPCM/PCM conversions on a single channel. If only one PCM/ADPCM/PCM conversion is used the deletion of these routines should not affect the output speech quality. The worst-case decoder execution time will be 779 instruction cycles without these routines which will meet real-time constraints at 27 MHz. Please note that these routines ARE neccesary to pass the CCITT test files. Terminology used in the DSP56001 code: Most arithmetic symbols adhere to those in the G.721 specification and are generally obvious with the possible exception of ** = power of - i.e. 2**(-4) = 2 to the power of -4 The following symbols correspond to the variable types defined in the G.721 specification: SM = signed magnitude value TC = two's complement value FL = floating point value A number preceding one of these symbols shows the number of total bits in a particular variable. The full binary representation of a variable, including the location of the radix point, is given in Table 3 of Recom- mendation G.721. For additional information to aid in understanding the code, the contents of registers or memory locations at certain points are detailed bit for bit. The terminology used is shown below. . = location of implied radix point i = integer bit f = fraction bit s = sign bit m = mantissa bit e = exponent bit 1 = bit is always 1 0 = bit is always 0 X = bit value is unknown but is not significant An exception is the PCM word where p = sign bit s = segment bit q = quantization level bit Example: ; Y = 0iii i.fff | ffff ff00 | 0000 0000 (13SM) Y_T DS 1 ;Quantizer scale factor This shows of variable definition of the scale factor Y. It is defined as a 13-bit signed magnitude number with 4 integer bits and 9 fractional bits. The value stored in memory is always stored in the 24-bit format shown above with the implied radix point between bits 18 and 19. Example: ; A1 = 01mm mmm0 | 0000 0000 | 0000 0000 (A2=A0=0) ; B1 = 0000 0000 | 0000 0000 | 0000 eeee (B2=B0=0) At the point where these comments appear in the code the register A1 always contains a 1 in bit 22, 5 other mantissa bits, and all other bits set to 0. Register B1 always contains 4 exponent bits in bits 0 through 3 with all other bits set to 0. Registers A0,A2, B0, and B2 are always set to 0. Implementation notes: This code is designed to be used as a stand-alone module. For users that wish to run the standard CCITT ADPCM algorithm there should be no need to modify the DSP56001 code that implements the standard (only the file I/O section needs to be modified). The comments in the code are provided for those who may wish to modify the algorithm itself for various reasons. As noted above it is essential to refer to Recommendation G.721 to under- stand the DSP56001 code. For those that may wish to modify the DSP56001 code the following may make the code somewhat easier to understand. The code for this algorithm is written to obtain optimal speed (due to real- time constraints). In almost all cases an attempt is made to implement a given function in the most efficient manner possible on the DSP56001. The primary goal is to make the worst case execution time as short as possible. The secondary goal is to conserve program/data memory. This goal of overall efficiency is the reason that most variables are NOT right justified as implied in the G.721 specification. Since the DSP56001 uses fractional-based arithmetic (i.e. left justified) keeping data as close to left justified as possible results in a more efficient implementation. It does however make the code more difficult to understand. Extra comments showing the contents of registers at various points in the code have been added to help clarify especially complex operations. The primary goal of execution speed is why code that is the same for both the encoder and the decoder is not made into subroutines. It was found that there was too much overhead involved in calling these subroutines to allow for real-time full-duplex performance. Not having subroutines results in the code requiring more program memory, but allows the code to run much faster. Even if the encoder and decoder shared routines the program would still require external program memory so additional high-speed RAM would still be required. Also in an effort to improve efficiency, the parallel nature of the DSP56001 is taken advantage of whenever possible. This causes the code to "run together" in many cases, such as getting data for the next section of code before the current section has been completed. This does however make the execution time of a particular routine more difficult to define exactly. A more detailed explanation of the DSP56001 code will be provided in the upcoming applications note, however the comments in the actual code will still provide the best aid in understanding the code. As noted above, a version of this program has been tested with real-time speech signals. The I/O section was modified to transmit and receive PCM data (64 kbits/s) to/from a CODEC connected to the SSI port of the DSP56001. For test purposes the received data from the CODEC was run through the encoder routine, sent directly to the decoder routine, and the decoded PCM data sent out to the same CODEC. The modification to talk to the CODEC was relatively simple and no modifications were made to the algorithm implementa- tion. The operation of the test set-up with sample speech signals was performed and the output quality was found to be comparable to standard PCM encoding/ decoding. Exhaustive tests with many types of voice-band telephone signals have not been performed, but since the code passes all of the CCITT test vectors the user may have reasonable confidence in the performance of this code.