Programming dsPIC MCU in PASCAL

Chapter11: DSP Engine

Introduction

Digital signal processing (DSP) module is a part of the device specialized for fast execution of the basic mathematical operations (addition, subtraction and multiplication) and for performing automatically accumulation, logical shifting, rounding off and saturation. This module makes the dsPIC30F devices very powerful and considerably extends the scope of their applications.

Processing of digital signals is very demanding. One of the biggest problems is the multiplication required for processing of digital signals. The family of dsPIC30F devices has a hardware implemented multiplier which accelerates considerably the processing. The major part of digital signal processing reduces to calculating the sums of products of two arrays. This module has been designed to allow a fast calculation of the sum of products:

Block diagram of the DSP module

Block diagram of the DSP module is shown in Fig.11-1.

DSP module block diagram

Fig. 11-1 DSP module block diagram

Fig. 11-1 illustrates the realization of the DSP module. In order to calculate the sum of products as fast as possible, the following additions have been made:

  1. One data bus for reading the operands (array elements). This gives two data buses, X and Y. The advantage of this approach is that two samples can simultaneously appear at the input of the DSP module, be multiplied and the product added to the partial sum already existing in the accumulator.
  2. A very fast hardware multiplier. The multipliers owing to their high complexities and voluminosities are difficult to be built-in in the majority of micoprocessors or microcontrollers. For this reason a multiplier is a part of the micoprocessor or microcontroller only in the applications when it is absolutely necessary, such as digital signal processing. The result of multiplication is a 32-bit quantity.
  3. High precision is the property of the DSP module in order to achieve sufficiently good precision while calculating the sum of products of a large number of array elements. Even though 32 bits are sufficient for saving the product of two 16-bit values, 8 bits have been added to increase the precision. This means that the accumulator contains 40-bit data.
  4. A barrel shifter serving for automatic hardware shifting of the values from the multiplier, accumulator, or data bus. This eliminates the need for any code for shifting (or multiply/divide by 2n ) of any value, thus the processing is accelerated.
  5. An adder independent of the multiplier and other parts of the DSP module, which allows a parallel execution of several DSP instructions. E. g. two array members have been multiplied and should be added to the accumulator. While the adder performs adding the value from the multiplier to the value in the accumulator, the multiplier processes next two array elements. This logic allows that processing of one partial sum of products is carried out in one instruction cycle, i.e. pipelining.
  6. Hardware rounding off, which is necessary at the end of the calculation of the sum of products, because the intermediate results are kept with a precision which is higher than required in order to reduce the error of the calculation. The rounding off can be convergent or conventional (non-convergent). This part of the DSP module further reduces the length of the corresponding code.
  7. Hardware saturation logic which may, but need not, be enabled. This logic prevents the unwanted events at overflow. Namely, 40 bits may sometimes be insufficient, particularly if the summing is performed for two long arrays consisting of the elements having high numerical values. Enabling this logic is recommended since it mitigates the effects of the errors and unwanted phenomena while calculating the sum of products of two large arrays.

11.1 X and Y data buses

calculation of the sum of products

The process of calculation of the sum of products comprises several steps:

  1. The first element of the first array and the first element of the second array (a1, b1) are taken first.
  2. The two values are then multiplied.
  3. The result is put to the accumulator (and added to the value already saved in the accumulator) A=A+partial sum
    The result of the multiplication of two elements is called partial sum.
  4. The process is continued with the next two array elements, until the last two elements have been multiplied.

As can be noticed, the process starts by reading two array elements, the 1st of the first array and the 1st of the second array. Several details need to be considered. Firstly, the array elements being multiplied are from different arrays. This means that the values of these elements are not saved in the adjacent locations. Secondly, the elements always have identical indices, i.e. the address changes are alway identical. E.g. if the arrays consist of 16-bit elements, after the calculation of each partial sum, it is required that the address for the next two elements that is increased or decreased by 2, depending whether the array is saved with the increasing or decreasing index. The change for such arrays will always be ±2. This is important because it allows this change to be caried out by hardware!

Reading the array elements to be multiplied can be accelerated by accessing simultaneously both memory locations. For this purpose it is required that the microcontroller has two data buses, two address generators and that the structure of the memory allows simultaneous access to different locations. The devices of the dsPIC30F family meet these requirements. The data buses are called X and Y bus, as shown in Fig. 11-1. There are two address generators (it is required to calculate simultaneously both addresses to be read). A multiple access to the memory is provided.

The X and Y data buses are considered here. To facilitate the realization, some constraints have to be introduced. Data memory where the elements of the arrays are saved has been split in two sections, X and Y data spaces, as shown in Fig. 11-2.

Organizations of data memories of dsPIC30F4013 and dsPIC30F6014A devices

Fig. 11-2 Organizations of data memories of dsPIC30F4013 and dsPIC30F6014A devices

Fig. 11-2 shows the example of data memory organization of a dsPIC30F4013 device. The memory capacity, i.e. the size of the X and Y data spaces are device specific. The space for the special function registers (SFR) remains the same and so does the starting address of the X space.

Splitting data memory to the X and Y data spaces introduces the constraint that each data bus can have access only to one of the spaces. An attempt to access the other space (e.g. an attempt to access X data space via Y data bus) will result in generation of a trap or, if the procedure for processing a trap has not been specified, in device reset.

The existence of the constraints, when using the DSP instructions has aready been mentioned. The principal contraints when reading data are:

  1. The X data bus has access only to X space in the data memory, or extended X space, which will be discussed later,
  2. The Y data bus has access only to Y space, which has no extention as the X space,
  3. The values to be sent to the multiplier for processing have to be in the general purpose registers W4...W7, specifically W4 and W5 for X space and W6 and W7 for Y space. The general purpose registers for the address pointers indicating the array elements to be read are W8...W11, specifically W8 and W9 for X space and W10 and W11 for Y space.

The practice is as follows:

  1. Load the address of the next array element form the X space to the register W8 or W9.
  2. Simultaneously, load the address of the next array element form the Y space to the register W10 or W11.
  3. Then, simultanmeously read both array elelments and load the corresponding values to the W4 or W5 register fot the X array and to W6 or W7 register for the Y array.
  4. Values of the registers W8/9 and W10/11 are automatically incremented by 2 (if the array elements are 16-bit quantities).

Of course, all this carried out by the hardware. The code should only provide data concerning address increment (increase/decrease) when calculating the partial sums, the initial and final addresses of the arrays, the registers used for loading the addresses and reading the array elements.

The most frequently used DSP instruction is MAC. The following example shows one of the forms of its use.

Example:
MAC W4*W6, A, [W8]+=2, W4, [W10]+=2, W6

The instruction from this example:

  1. Multiplies the values in the registers W4 and W6,
  2. The result of the mulitplication adds to the accumulator A,
  3. From the address in the X space pointed by the register W8 loads the value of the next element to the register W4,
  4. After reading the array element in the X space, increments the value of the register W8 to point at the location of the next element of the X array,
  5. From the address in the Y space pointed by the register W10 loads the value of the next element to the register W6,
  6. After reading the array element in the Y space, increments the value of the register W10 to point at the location of the next element of the Y array.

The hardware specialized for DSP instructions allows that all this is executed in one instruction cycle! As can be seen, the DSP module makes the devices of the dsPIC30F family very powerful. If the device clock is 80MHz, the instruction clock is 20MHz, i.e. in one second 20 milion MAC instructions can be executed each including all six actions listed above!

11.2 PSV management

It is customary that the X space contains stationary arrays, e.g. digital filter coefficients, FFT blocks, etc, whereas the Y space contains dynamic arrays, e.g. samples of the signal being processed or similar. For this reason it was made possible that one section of he program memory is mapped as X space. This means that the coefficients of a digital filter may be saved in the program memory as constants, that this section of the program memory is declared an extention of the X space and that it is accessible as if it was in the data memory. Of course, the addresses of these elements will not be the same as in the program memory, but they will start from the address 0x8000, as can be seen in Fig. 11-2. In this way an additional capacity for saving coefficients has been obtained which is essential particularly for high order filters, FFT algorithms and many other applications. This procedure is known as Program Space Visibility (PSV) Management.

Address generation in PSV management is shown in Fig. 11-3.

Address generation in PSV management

Fig. 11-3 Address generation in PSV management

The element of the array to be read is in the program memory having 24-bit addresses, whereas DSP instructions can operate only with 16-bit addresses. The remaining 8 bits are obtaind from the PSVPAG register. The whole procedure reduces to the PSVPAG register writing the most significant byte od the 24-bit program memory address, PSV management is activated and the array is accessed as if it was in the data memory. The hardware of the DSP module will add 8 bits to each address, as shown in Fig.11-3. In this way a 24-bit address is generated and the array element has been read correctly.

Example:
PSV management is enabled. The array in the program memory starts from the address 0x108000
W1=0x8000
PSVPAG=0x21
In the binary system this is

PSV Equation

The underlined bit zero is the most significant bit that can be set to zero by hardware (see Fig. 11-3). EA denotes the Effective Address in the program memory.

As Fig. 11-3 shows, of the program memory address only the 15 least significant bits are used and the highest bit is set to logic one to denote that the PSV management is enabled. 8 bits from the register PSVPAG are added to the above 15 bits and the highest bit in the program memory address is logic zero. In this way the addresses of the array elements in the program memory are obtained. All this is done automatically, i.e. when writing a program no attention should be payed to this. All that should be done is to set the corresponding value in the register PSVPAG and activate the PSV management in the register CORCON. The structure of the CORCON register is given at the end of this chapter.

11.3 Multiplier

One of the essential improvements which made possible the execution of an instruction in one instruction cycle is hardware multiplier. The input data are 16-bit quantities and the 32-bit output data are extended to 40 bits in order to facilitate adding to the current value in the accumulator. If this multiplier did not exist, multiplying 16-bit quantities would require 16 instruction cycles which would for many digital signal processing applications be unacceptably long.

The values to be multiplied are via the X and Y data buses fed simultaneously to the input of the multiplier. The output of the multiplier is fed to the to the 40 bits extension block retaining the sign.

By multiplying two 16-bit values one obtains a 32-bit value. However, the aim is not only to multiply but also to calculate the sum of the partial products for the whole array. Therefore, multiplying is only one part and the result is a partial sum. The number of the partial sums will correspond to the length of the array. It follows that the 32 bits will not be sufficient for saving the result because the probability of overflow is high. For this reason it is required to extend the value which will be accumulated. It has been adopted that this extention is 8 bits resulting in the total of 40 bits.

In the worst case that all partial sums are the maximum 32-bit value, one can sum 256 partial sums before overflow. This means that the maximum length of an array consisting of 32-bit elements all of the maximum value is 256. For most applications this is sufficient, but such arrays are very rare and the permissible array lengths are several times longer.

Depending on the values of indiviual bits in the CORCON register, the multiplication may be carried out with signed or unsigned integers or signed or unsigned fractionals formated 1.15 (1 bit for sign and 15 bits for value). The most often used format is fractional (radix).

11.4 Barrel shifter

Barrel shifter serves for automatic shifting the values from the multiplier or accumulator for an arbitrary number of bits to the right or to the left. This operation is often required when scaling a partial or the total sum. The barrel shifter is added in order to simplify the code. Shifting the values is carried out in parallel with the execution of instruction.

The input to the barrel shifter is 40-bit and the output may be 40-bit or 16-bit. If the value from the accumulator is scaled and the result should be fed back to the accumulator, then the output is 40-bit. It is possible to scale the result from the accumulator and save the obtained value in the memory as the final result of the calculation.

11.5 Adder

A part of the DSP module is a 40-bit adder which may save the result in one of the two 40-bit accumulators. The activation of the saturation logic is optional. The adder is required for accumulation of the partial sums. Adding or subtracting of the partial sums is performed automatically as a part of DSP instructions, no additional code is required which allows extermely short time for signal processing.

An example which illustrates the significance of a hardware, independent adder is the previous example of the MAC instruction.

Example:
MAC W4*W6, A, [W8]+=2, W4, [W10]+=2, W6
The instruction of this example:

  1. Multiplies the values from the registers W4 and W6,
  2. The result of the multiplication is added to the value in the accumulator,
  3. From the address in the X space pointed by the register W8 the value of the next array element is loaded to the register W4,
  4. After reading the array element in the X space, the register W8 is incremented to point at the next array element in the X space,
  5. From the address in the Y space pointed by the register W8 the value of the next array element is loaded to the register W6,
  6. After reading the array element in the Y space, the register W10 is incremented to point at the next array element in the Y space.

It is important that this DSP instruction is executed in one instruction cycle! This means that the whole algorithm for calculating the sum of products consists of loading arrays to the memory, adjusting the parameters of the DSP module (format, positions of the arrays, etc) and then the above instruction is called the corresponding number of times.

Example:
CLR A
REPEAT #20
MAC W4*W6, A, [W8]+=2, W4, [W10]+=2, W6

The result of the execution of the above program is calling the MAC instruction 21 times (REPEAT means that the next instruction will be called 20+1 times). If the first array elements have been loaded to the registers W4 and W6 and the initial addresses in the data memory or extended data memory (PSV management) loaded to the registers W8 and W10 before the execution of the program, then, upon completion of the REPEAT loop, the accumulator will contain the sum of products of the 20 elements of the two arrays.

This section of the code occupies 3 locations in the program memory and includes 22 instruction cycles (1 for MOV, 1 for REPEAT and 20 for MAC). If the device clock is 80MHz (20MHz instruction clock), then the program will be executed in 22*50 = 1100ns!

It should be noted that without an independent adder which may carry out the operations simultaneously with the multiplier and other parts of the DSP module, this would not be possible. Then the parallelism in the execution of instructions would not be possible and the execution of one DSP instruction would last at least one clock longer.

Two 40-bit accumulators for saving the partial sums are avialable. These are accumulator A and accumulator B (ACCA and ACCB). The accumulators are mapped in the data memory and occupy 3 memory locations each. The addresses and distribution of the accumulator bits are given at the end of this chapter.

11.6 Round logic

Data accumulation is carried out with 40-bit data. The architecture of the of the dsPIC30F devices, however, is 16-bit, meaning that the conversion to 16-bit values has to be done. A higher number of bits for calculating the sums by DSP instructions is intended for increasing the accuracy and the operating range of values and so reduce the error owing to the finite word length. The purpose of individual bits within the accumulator is presented in Fig. 11-4.

Purpose of individual bits within accumulator

Fig. 11-4 Purpose of individual bits within accumulator

As can be seen from Fig. 11-4, the upper 8 bits serve for extending the range and the lower 16 bits for increasing the operational accuracy. Increasing the range is sometimes useful as an intermediate step, but the end result should not overrun the basic range. If this occurs, the result will not be correct. The consequences can be mitigated by enabling the saturation logic, but not completely neutralized.

After the calculation is completed, it is required to save the result as a 16-bit quantity. In order to do that, the accuracy of the result has to be reduced. For doing this process automatically, the DSP module is added a block which automatcally rounds off the result during an accumulator write back. From Fig. 11-3 it can be seen that the round logic is placed between the accumulator and X data bus. If the round logic is on (in the CORCON register), by using a separate instruction the result from the accumulator is rounded off and saved in the desired location in the data memory.

The round logic can perform a conventional (biased) or convergent (unbiased) round function.

The conventional round function implies the following. If the MS bit of the accuracy increase bits (bit 15) is logical one, the result will be one step incremented. One step is the least significant positive value that can be saved in the selected format (1 for integer, 1.15 for fractional point), specifically 0.000030517578125. A consequence of this algorithm is that over a succession of random rounding operations, the value will tend to be biased slightly positive.

The convergent round function assuming that bit 16 is effectively random in nature, will remove any rounding bias that may accumulate. If the convergent round function is enabled, the LS bit of the result will be incremented by 1 if the value saved in the accuracy increase 16 bits is greater than 0x08000, or if this value is equal to 0x08000 and bit 16 is one. This algorithm can be readily be explained by using integers. If the middle 16 bits contain an integer, the rounding algorithm will tend towards even integers. This is demonstarted by the examples in Table 11-1.

Value to be rounded The result Binary form of the value to be rounded Binary form of the result
12.75 13 0000 0000
0000 0000 0000 1100
1100 0000 0000 0000
0000 0000 0000 1100
12.5 12 0000 0000
0000 0000 0000 1100
1000 0000 0000 0000
0000 0000 0000 1100
13.5 14 0000 0000
0000 0000 0000 1101
1000 0000 0000 0000
0000 0000 0000 1110

Table 11-1 Covergent round function.
The convergent round function usually gives better results and its use is recommended.

11.7 Saturation logic

When calculating a sum of products of two arrays comprising many elements (more than 256), there is a risk of exceeding the range. In this case, the obtained value is not only inaccurate but also of the opposite sign. These sudden changes of values of a signal (known as glitches) are easily recognized because they violate the characteristics of a signal.

The consequences can be mitigated if the saturation logic is enabled. If, while executing current instruction, an overrun occurs, the hardware saturation logic will load the maximum positive or maximum negative value to the operating accumulator, depending on the previous value loaded to the accumulator. In this way the consequences of a range overrun are mitigated. Fig. 11-6 shows the case of an output sinusoidal signal when an overrun occured, without and with enabled saturation logic.

Consequences of range overrun without (left) and with (right) enabled saturation logic

Fig. 11-6 Consequences of range overrun without (left) and with (right) enabled saturation logic

The figure shows that if an overrun occurs and the saturation logic is not enabled, the consequences are greater by far compared to those when the saturation logic is enabled. In the first case a glitch which appears violates considerably the characteristics of the signal. In the second case, owing to the enabled satutration logic, the consequence will only be the unwanted clipping of the crest of the sinusoidal signal, which is much better compared to the first case. With the saturation logic enabled, a lesser overrun corresponds to a lesser consequence, whereas with the saturation logic disabled this does not apply.

There are three modes of operation of the saturation logic: accumulator 39-bit saturation, accumulator 31-bit saturation and write-back saturation logic. In the first case, overrun is allowed until the MS bit (corresponding to the sign in signed operations) is overrun. This is an optimistic version, because it is assumed that by the end of the calculation the signal will decrease to the permitted range. The reason is that the MS 8 bits are the range extention and they are very seldom used so the middle 16 bits contain the final result. This mode is enabled by writing logic one to the ACCSAT bit (CORCON register, bit 4).

A pesimistic version is to enable the saturation logic for the 31 bits when the accumulated value must not overrun the range at any time during the calculation of the sum of products. This mode is enabled by writing logic zero to the ACCSAT bit (CORCON register, bit 4). In case that the saturation logic detects that the current instruction could cause an overrun, the maximum positive value (0x007FFFFFFF) is written to the operating accumulator (A or B) if the accumulator contains a positive value, or the minimum negative value (0xFF80000000) if the accumulator contains a negative value.

If the satuation logic is enabled, at each overrun the bit SA (register SR, bit 13) is set when the saturation logic is enabled for the accumulator A, or SB (register SR, bit 12) when the saturation logic is enabled for the accumulator B. Saturation logic enable for the accumulator A is done by setting the SATA bit (CORCON register, bit 7) to logic one. Similarly, saturation logic enable for the accumulator B is done by setting the SATB bit (CORCON register, bit 6) to logic one.

The third mode of the saturation logic is that the overrun is tested while writing the result from the operating accumulator to a general purpose register (W0...W15). The advantage of this approach is that during calculations it allows using the full range offered by the accumulator (all 40 bits). This logic is enabled only when executing the SAC and SAC.R instructions if the SATDW bit (register CORCON, bit 5) is set to logic one. For the values greater than 0x007FFFFFFFF, in the memory (general purpose registers are a part of the data memory) will be written the value 0x7FFFF. Similarly, for the values smaller than 0xFF80000000, in the memory will be written the value 0x8000, representing the smallest negative number that can be expressed by 16 bits.

11.8 DSP instructions

For using the DSP module in an optimum way, it is necessary to konw all DSP instructions. The list of DSP instructions, including the parameter description and application of the instruction is presented in table 11-2.

Instruction Instruction and parameters Parameter description Operation description
MAC MAC Wm*Wn, Acc Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Values of the Wm and Wn registers are multiplied and added to the current value in the operating accumulator (A or B)
MAC MAC Wm*Wn, Acc, [Wx], Wxd, [Wy], Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
Wy – W10 or W11
Wyd – W6 or W7
Values of the Wm and Wn registers are multiplied and added to the current value in the operating accumulator (A or B), from the address pointed by the Wx register the value is read and written to the Wxd register, from the address pointed by the Wy register the value is read and written to the Wyd register.
MAC MAC Wm*Wn, Acc, [Wx]+=kx, Wxd, [Wy]+=ky, Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
kx – (-6,-4,-2, 2, 4, 6)
Wy – W10 or W11
Wyd – W6 or W7
ky – (-6,-4,-2, 2, 4, 6)
Values of the Wm and Wn registers are multiplied and added to the current value in the operating accumulator (A or B), from the address pointed by the Wx register the value is read and written to the Wxd register, from the address pointed by the Wy register the value is read and written to the Wyd register, the Wx register value is decreased by kx, the Wy register value is decreased by ky.
MOVSAC MOVSAC Acc[Wx], Wxd, [Wy], Wyd, AWB Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
Wy – W10 or W11
Wyd – W6 or W7
AWB – W13 (Acc write-back)
The value from the operating accumulator is saved in the register W13 (AWB - accumulator write back), from the address pointed by the register Wx the value is read and written to the register Wxd, from the address pointed by the register Wy the value is read and written to the register Wyd.
MPY MPY Wm*Wn, Acc Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
The values in the Wm and Wn registers are multiplied and written to the operating accumulator.
MPY MPY Wm*Wn, Acc [Wx], Wxd, [Wy], Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
Wy – W10 or W11
Wyd – W6 or W7
The values of the Wm and Wn registers are muliplied and written to the accumulator (A or B), from the address pointed by the register Wx the value is read and written to the register Wxd, from the address pointed by the register Wy the value is read and written to the register Wyd.
MPY MPY Wm*Wn, Acc [Wx]+=kx, Wxd, [Wy]+=ky, Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
kx – (-6,-4,-2, 2, 4, 6)
Wy – W10 or W11
Wyd – W6 or W7
ky – (-6,-4,-2, 2, 4, 6)
The values of the Wm and Wn registers are multiplied and written to the operating accumulator (A or B), from the address pointed by the Wx register the value is read and written to the Wxd register, from the address pointed by the Wy register the value is read and written to the Wyd register, the Wx register value is increased by kx, the Wy register value is increased by ky.
MPY MPY Wm*Wn, Acc[Wx]-=kx, Wxd, [Wy]-=ky, Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
kx – (-6,-4,-2, 2, 4, 6)
Wy – W10 or W11
Wyd – W6 or W7
ky – (-6,-4,-2, 2, 4, 6)
The values of the Wm and Wn registers are multiplied and written to the operating accumulator (A or B), from the address pointed by the Wx register the value is read and written to the Wxd register, from the address pointed by the Wy register the value is read and written to the Wyd register, the Wx register value is decreased by kx, the Wy register value is decreased by ky.
MSC MSC Wm*Wn, Acc[Wx], Wxd, [Wy], Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
Wy – W10 or W11
Wyd – W6 or W7
The values of the Wm and Wn registers are multiplied and subtracted from the curent value in the operating accumulator (A or B), from the address pointed by the Wx register the value is read and written to the Wxd register, from the address pointed by the Wy register the value is read and written to the Wyd register.
MSC MSC Wm*Wn, Acc[Wx]+=kx, Wxd, [Wy]+=ky, Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
kx – (-6,-4,-2, 2, 4, 6)
Wy – W10 or W11
Wyd – W6 or W7
ky – (-6,-4,-2, 2, 4, 6)
The values of the Wm and Wn registers are multiplied and subtracted from the current value in the operating accumulator (A or B), from the address pointed by the Wx register the value is read and written to the Wxd register, from the address pointed by the Wy register the value is read and written to the Wyd register, the Wx register value is increased by kx, the Wy register value is increased by ky.
MSC MSC Wm*Wn, Acc[Wx]-=kx, Wxd, [Wy]-=ky, Wyd Wm – W4 or W5
Wn – W6 or W7
Acc – A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
kx – (-6,-4,-2, 2, 4, 6)
Wy – W10 or W11
Wyd – W6 or W7
ky – (-6,-4,-2, 2, 4, 6)
The values of the Wm and Wn registers are multiplied and subtracted from the current value in the operating accumulator (A or B), from the address pointed by the Wx register the value is read and written to the Wxd register, from the address pointed by the Wy register the value is read and written to the Wyd register, the Wx register value is decreased by kx, the Wy register value is decreased by ky.
NEG NEG Acc Acc – A or B (operating accumulator) Acc ← -Acc, the sign of the current value in the accumulator is changed, analogous to the multiplying of the value in the operating accumulator by –1.
REPEAT REPEAT #lit14 #lit14 – 14-bit unsigned value (0...16383) The instruction following REPEAT will be executed #lit14+1 times. Even though this is not a DSP instruction, it is very often used when using DSP instructions.
REPEAT REPEAT Wn Wn – W0...W15 The instruction following REPEAT will be executed Wn+1 times. Even though this is not a DSP instruction, it is very often used when using DSP instructions.
SAC SAC Acc, {#Slit4,} Wd Acc – A or B accumulator
{#Slit4,} – optional 4-bit constant
Wd – W0...W15
If the optional 4-bit constant is specified, the accumulator value is shifted to the right for the positive value of the constant or to the left if the constant is negative. Then, the obtained value is loaded to Wd.
SAC SAC Acc, {Slit4,} [Wd] Acc - A or B accumulator
{#Slit4,} – optional 4-bit constant
Wd – W0...W15
If the optional 4-bit constant is specified, the accumulator value is shifted to the right for the positive value of the constant or to the left if the constant is negative. Then, the obtained value is loaded to the address in the data memory pointed by the Wd register.
SAC SAC Acc, {Slit4,} [Wd++] Acc - A or B accumulator
{#Slit4,} – optional 4-bit constant
Wd – W0...W15
If the optional 4-bit constant is specified, the accumulator value is shifted to the right for the positive value of the constant or to the left if the constant is negative. Then, the obtained value is loaded to the address in the data memory pointed by the Wd register. After memory write, the value of the register Wd is incremented by 2.
SAC SAC Acc, {Slit4,} [Wd -] Acc - A or B accumulator
{#Slit4,} – optional 4-bit constant
Wd – W0...W15
If the optional 4-bit constant is specified, the accumulator value is shifted to the right for the positive value of the constant or to the left if the constant is negative. Then, the obtained value is loaded to the address in the data memory pointed by the Wd register. After memory write, the value of the register Wd is decremented by 2.
SAC SAC Acc, {Slit4,} [++Wd] Acc - A or B accumulator
{#Slit4,} – optional 4-bit constant
Wd – W0...W15
If the optional 4-bit constant is specified, the accumulator value is shifted to the right for the positive value of the constant or to the left if the constant is negative. Then, the value of the register Wd is incremented by 2 and the value obtained by shifting is saved in the address pointed by the Wd register.
SAC SAC Acc, {Slit4}, [--Wd] Acc - A or B accumulator
{#Slit4,} – optional 4-bit constant
Wd – W0...W15
If the optional 4-bit constant is specified, the accumulator value is shifted to the right for the positive value of the constant or to the left if the constant is negative. Then, the value of the register Wd is decremented by 2 and the value obtained by shifting is saved in the address pointed by the Wd register.
SAC.R The same as for SAC The same as for SAC The same as for the SAC instruction except that the value from the accumulator is rounded by the conventional or convergent mode.
SFTAC SFTAC Acc, #Slit6 #Slit6 – 6-bit constant Shift the value in the accumulator by #Slit6 bits. If the constant is positive, shifting is to the right, otherwise to the left.
SFTAC SFTAC Acc, Wd Wd – W0...W15 Shift the value in the accumulator by Wd bits. If the register Wd is positive, shifting is to the right, otherwise to the left.
CLR CLR Acc Acc - A or B accumulator The value in the operating accumulator is set to zero.
CLR CLR Acc, [Wx], Wxd, [Wy], Wyd Acc - A or B accumulator
Wx – W8 or W9
Wxd – W4 or W5
Wy – W10 or W11
Wyd – W6 or W7
The value in the operating accumulator is set to zero. From the address in the data memory pointed by Wx the value is read and written to the register Wxd. From the address in the data memory pointed by Wy the value is read and written to the register Wyd.
CLR CLR Acc, [Wx]+=kx, Wxd, [Wy]+=ky, Wyd Acc – A or B accumulator Wx – W8 or W9
Wxd – W4 or W5
kx – (-6,-4,-2, 2, 4, 6)
Wy – W10 or W11
Wyd – W6 or W7
ky – (-6,-4,-2, 2, 4, 6)
The value in the operating accumulator is set to zero. From the address in the data memory pointed by Wx the value is read and written to the register Wxd. From the address in the data memory pointed by Wy the value is read and written to the register Wyd. The Wx register value is increased by kx, the Wy register value is increased by ky.

Table 11-2 List of DSP instructions with description of operations and parameters.

Table 11-2 shows that some instructions (such as MAC) could have more than one form. All versions of the instructions have not been descibed, but the emphasis was put on the most frequently used versions, in order to illustrate the way of thinking when using DSP instructions.

The structures of individual registers of the DSP module are given in Tables 11-3 to 11-6.

NOTE: Reading of bits which have not been alocated any functions gives '0'.

name ADR 15 14 13 12 11 10 9 8
CORCON 0x0044 - - - US EDT DL<2:0>
7 6 5 4 3 2 1 0 Reset State
SATA SATB SATDW ACCSAT IPL3 PSV RND IF 0x0020

Table 11-3 Description of the CORCON register

US – DSP multiply unsigned/signed control bit 
     (1 – unsigned multiplication, 0 – signed multiplication)
EDT – Early DO loop termination control bit. This bit will always read as’0’.
      1 – Terminate executing DO lop at the end of current loop iteration
      0 – No effect
DL<2:0> - DO loop nesting level status bit
   111 – 7 nested DO loops active
   110 – 6 nested DO loops active
   ...
   001 – 1 nested DO loop active
   000 – 0 DO loops active
SATA – AccA  saturation enable bit
       1 – Accumulator A saturation enabled
       0 – Accumulator A saturation disabled 
SATB – AccB  saturation enable bit
       1 – Accumulator B saturation enabled
       0 – Accumulator B saturation disabled
SATDW – Data space write from DSP engine saturation enable bit
       1 – Data space write saturation enabled
       0 – Data space write saturation disabled
ACCSAT – Accumlator saturation mode select bit
         1 – 9.31 saturation (super saturation)
         0 – 1.31 saturation (normal saturation)
IPL3 – CPU interrupt priority level status bit
       1 – CPU interrupt priority level is greater than 7
       0 – CPU interrupt priority level is 7 or less
PSV – Program space visibility in data space enable bit 
      (1 – PSV visible in data space, 0 – PSV  not visible in data space)
RND – Rounding mode select bit 
      (1- conventional rounding enabled, 0 – convergent rounding enabled)
IF – Integer or fractional multiplier mode select bit 
     (1 – integer mode enabled, 0 – fractional mode enabled (1.15 radix))
name ADR 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 Reset State
ACCAU 0x0026 SE ACCAU 0x0000

Table 11-4a Description of the ACCA register

name ADR 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Reset State
ACCAH 0x0024 ACCAH 0x0000

Table 11-4b Description of the ACCA register

name ADR 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reset State
ACCAL 0x0022 ACCAL 0x0000

SE – Sign extention for AccA accumulator

Table 11-4c Description of the ACCA register

name ADR 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 Reset State
ACCBU 0x002C SE ACCBU 0x0000

Table 11-5a Description of the ACCB register

name ADR 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Reset State
ACCBH 0x002A ACCBH 0x0000

Table 11-5b Description of the ACCB register

name ADR 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reset State
ACCBL 0x0028 ACCBL 0x0000

SE – Sign extention for AccB accumulator

Table 11-5c Description of the ACCB register

name ADR 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reset State
SR 0x0042 OA OB SA SB OAB SAB DA DC IPL<2:0> RA N OV Z C 0x0000

SE – Sign extention for AccB accumulator

Table 11-6 Description of the SR register

OA – Accumulator A overflow status bit 
     (1 – accumulator A overflowed, 0 – accumulator A has not overflowed)
OB - Accumulator B overflow status bit 
     (1 – accumulator B overflowed, 0 – accumulator B has not overflowed)
SA – Accumulator A saturation ‘sticky’ status bit. 
     This bit can be cleared or read but not set to ‘1’.
     1 – accumulator A is saturated or has been saturated at some time
     0 – accumulator A is not saturated
SB – Accumulator B saturation ‘sticky’ status bit. 
     This bit can be cleared or read but not set to ‘1’.
     1 – accumulator B is saturated or has been saturated at some time
     0 – accumulator B is not saturated
OAB - OA¦¦OB combined accumulator overflow status bit
      1 – accumulators A or B have overflowed
      0 – neither accumulator A or B have overflowed
SAB - SA¦¦SB combined accumulator ‘sticky’ status bit
      1 – accumulators A or B saturated or have been saturated at some time
      0 – neither accumulator A or B are saturated
DA – DO loop active bit 
     (1 – DO loop in progress, 0 – DO loop not in progress)
DC – MCU ALU half carry/borrow bit (1 – a carry-out from the 4th order bit 
     (8-bit operations) or 8th order bit (16-bit operations) of the result occured, 
     0 – no carry-out from the 4th order bit (8-bit operations) or
     8th order bit (16-bit operations) of the result occured )
IPL<2:0> - CPU internal priority level status bit. 
           These bits are concatenated with the IPL<3> bit (CORCON<3>) to form 
           the CPU interrupt priority level.
    111 – CPU interrupt priority level is 7(15). User interrupts disabled.
    110 – CPU interrupt priority level is 6(14). User interrupts disabled.
    ...
    001 – CPU interrupt priority level is 1(9). User interrupts disabled.
    000 – CPU interrupt priority level is 0(8). User interrupts disabled.
    
RA – REPEAT loop active bit 
     (1 – REPEAT loop in progress, 0 – REPEAT loop not in progress)
N – MCU ALU negative bit 
    (1- result was negative, 0 – result was non-negative (zero or positive)
OV – MCU ALU overflow bit (1 – overflow occured for signed arithmetic, 
     0 – no ovwerflow occured). This bit is used for signed arithmetic (2’s complement).
     It indicates an overflow of the  magnitude which causes the sign bit to 
     change state.
Z – MCU ALU Zero bit (1 – a zero result, 0 – a non-zero result)
C – MCU ALU carry/borrow bit (1 – a carry-out from the MS bit of the result occured, 
    0 – no carry-out from the MS bit of the result occured)

11.9 DSP examples

Example 1 – (calculation of the sum of products of two arrays):

Calculation of the sum of products of two arrays

The example shows how the DSP module can be used for fast calculation of the sums of products of two arrays. The elements of the arrays are kept in the data memory. The X and Y spaces have been detailed in Chapter 8.

{dsPIC30F6014A}

program ArraySum;

var
  i    : word;
  arr1 : array[20] of Integer; absolute $0900;// array of 20 signed-integer elements in X space
  arr2 : array[20] of Integer; absolute $1900;// array of 20 signed-integer elements in Y space
                                              // replace $1900 with $0D00 for dsPIC30F4013 MCU
                                               
begin

  TRISD := 0;             // configure PORTD as output
  
  for i := 0 to 19 do     // init arr1 and arr2
    begin
      arr1[i] := i+3;
      arr2[i] := 15-i;
    end;

  CORCON := $00F1;        // signed computing, saturation for both Acc, integer computing

  asm
    mov    #@_arr1, W8    //  W8 := @arr1, point to a first element of array
    mov    #@_arr2, W10   // W10 := @arr2, point to a first element of array

    mov    #0, W4         // clear W4
    mov    #0, W6         // clear W6

    clr    A
    repeat #20
    mac    W4*W6, A, [W8]+=2, W4, [W10]+=2, W6     // AccA := sum(arr1*arr2)

    sftac  A, #-16        // shift the result in high word of AccA
    sac    A, #0, W1      // W1 := sum(arr1*arr2)
  end;
  
  LATD := W1;             // LATD := sum(arr1*arr2)
  
end.

Example 1 presents the progam for calculating the sum of products of the array elements. The elements of arr1 are stored in the X space and the elements of arr2 in the Y space of the data memory.

At the start of the program the values of the array elements are initiated.

for i:=0 to 19 do    
    begin
      arr1[i]:=i+3;
      arr2[i]:=15-i;
     end;

Before the calculation starts, it is necessary to set the DSP module for signed-integer computing. This is done by writing $00F1 to the register CORCON (CORCON:=$00F1;). At the same time the saturation logic for both accumulators (A nad B) is enabled, even though the accumulator B is will not be used. Table 11-3 gives the meanings of individual bits of the CORCON register.

The next step is writing the initial addresses (addresses of the first array elements) of the arr1 and arr2 arrays to the W8 and W10 registers, respectively. It has been decided to use the registers W8 and W10 for specifying the addresses of the next array elements and the registers W4 and W6 for the array elements being multiplied in the current iteration (partial sum).

mov #@_arr1, W8    
mov #@_arr2, W10

Since the addresses of the first array elements are saved in the W8 and W10 registers, the process of multiplying and accumulating the partial sums can be started. Of course, the initial value of the accumulator A is set to zero by clr A.

The instruction MAC, for calculation of the partial sums and their adding to the current accumulator value should be executed 21 times. The first partial sum will be zero since the values of the registers W4 and W6 are zero. The purpose of the first execution of the MAC instruction is to read the values of the first elements from the data memory snd write them to the registers W4 and W6. After that, the instruction MAC is executed 20 times, calculating the partial sums which are accumulated in the accumulator A. The instruction MAC and the corresponding parameters are described in Table 11-2.

repeat #20
mac W4*W6, A, [W8]+=2, W4, [W10]+=2, W6 //AccA:=sum(arr1*arr2)

After the inxtruction MAC has been executed, the result is in the lower 16 bits of the accumulator A. The result could be read directly from AccAL, i.e. from address 0x0022 (see Table 11-4c), but it is regular practice to shift the result to AccAH, i.e. perfom the shift left 16 times and then read the result by using instruction SAC. In this way the consequences of an overflow, if it occurs, will be mitigated. In this case no overflow will occur, nevertheless the result is read in a regular way.

The shift left 16 times is performed by the instruction:

SFTAC A, #-16

The instruction SFTAC with its parameters is described in Table 11-2. After the result has been shifted 16 places to the left, it is read and saved in the W1 register. This is done by the instruction:

SAC A, #0, W1

The instruction SAC with its parameters is described in Table 11-2.

NOTE: The instruction SAC reads the results from AccAH (see Fig. 11-4)

Example 2 – (using modulo addressing and PSV management)

The example shows the use of the modulo addressing and PSV management. The result is the sum of array elements of alternated signs:

Sum of array elements of alternated signs

The elements are saved in the data memory and the sign (-1, +1) in the program memory.

{dsPIC30F6014A}

program AlternateSum;

const Sgn : array[2] of Integer = (1,-1);  // Signes for sum

var
  arr      : array[14] of Integer;//array of 14 signed-integers in Y space (Y space is default)
  i        : Integer;
  adr2     : Word;

begin

  TRISD := 0;               // Configure PORTD as output
  
  for i := 0 to 13 do
    arr[i] := i+1;          // init arr

  adr2 := @Sgn;             // dummy line, just for linking Sgn before usage inside asm block
  
  MODCON  := $8008;         // X modulo addressing, on W8 register
  XMODSRT := adr2;          // XMODSRT points to the start of Sgn array
  XMODEND := adr2+3;        // XMODEND points to the end of Sgn array

  CORCON := $00F5;          // Signed computing, saturation for both Acc, 
                            //integer computing, PSV managment

  asm
    mov    #@sgn, W8        // W8  := @Sgn in X space (mirror), 
                            // points to a 1st of 2 elements in Sgn array
    mov #@_arr, W10         // W10 := @arr, points to a first element of arr
    mov #0, W4              // clear W4
    mov #0, W6              // clear W6
    clr A                   // clear accumulator for computing
    repeat #14              // 15 iterations
    mac W4*W6, A, [W8]+=2, W4, [W10]+=2, W6  // AccA := sum(Sgn*arr)
    sftac A, #-16           // shift the result in high word of AccA
    sac A, #0, W1           // W1 := sum(Sgn*arr)
  end;
  
  LATD := W1;               // LATD := sum(Sgn*arr)
  
end.

Example 2 shows the method of using modulo addressing, described in Chapter 8, Section 8.3 and PSV management, described in Chapter 11, Section 11.2.

Constants +1 and -1 for multiplying the elements of the array arr are saved in the program memory. The advantage of this approach is a reduction in using the data memory. It is particularly suitable when several arrays having constant elements should be saved. Then, the use of the program memory is recommended. The data memory should be used when the array elements are not constants (unknown at the moment of compiling) or if the program memory is full (a rare event).

Addresses in the program memory are 24-bit, whereas in the dtata memory are 16-bit. For this reason it is necessary to perform mirroring, by using PSV management, in the upper block of the data memory (addresses above $7FFF). The mirroring is performed in two steps:

  1. Write in the PSVPAG register the corresponding value (see Figs. 11-3 and 11-7)
  2. PSV management is enabled by setting the PSV bit (CORCON<2>, see Table 11-3).

Obtaininmg the value to be written to the PSVPAG register is shown in Fig. 11-7.

Obtaining the value of the PSVPAG register

Fig. 11-7 Obtaining the value of the PSVPAG register

Writing to the PSVPAG register and obtaining the effective address (the address of the array mirrored to the data memory) are carried out by the following set of instructions:

ptr:=@Sgn;           //ptr points to Sgn (24-bit address)
adr:=Hi(ptr);        //Get upper Word (16 bits)
adr2:=adr and $00FF; //Only lower 8 bits are relevant
PSVPAG:=adr2;        //Load PSVPAG
adr2:=ptr and $FFFF; //Get lower Word (16bits). Mirrored address in X space
adr2.15:=1;          //Upper data-MEM

The variables ptr and adr are 32-bit long (LongInt). When this part of the code is executed, the PSVPAG register will contain the corresponding value and in adr2 will be the address of the first element of the constant array.

The array arr comprises 14 elements and array Sgn only 2. In order to calculate the required sum, modulo addressing should be used for the Sgn array. This is enabled by writing 0x8008 in the MODCON register.

MODCON:=$8008

This instruction enables modulo addressing in the X space via the W8 register. The structure of the MODCON register is shown in Chapter 8, Table 8-3.

After modulo addressing has been enabled, it is necessary to define the initial and final addresses of this addressing by writing the corresponding values to the XMODSRT and XMODEND registers. The initial address of the constant array contained by adr2 is written to the XMODSRT register. The address of the last byte of the constant array, i.e. adr2+4-1 (+4 because two elements occupy 2 locations (bytes) each and –1 to obtain the address of the last byte) is written to the XMODEND register.

XMODSRT:=adr2;
XMODEND:=adr2+3;

The next step is setting the required bits in the CORCON register. The structure of the CORCON register is shown in Table 11-3. For the signed computing (positive and negative numbers), enabled saturation logic for both accumulators, enabled saturation logic during writing to the data memory, integer computing and enabled PSV management the value 0x00F5 should be written to the CORCON register. Enabling the saturation logic for accumulator B is superfluous, but it is inserted in the example to show that the saturation logic can be enabled for both accumulators simultaneously.

After the corresponding value has been written to the CORCON register, the initialization of the W4, W6, W8 and W10 registers is performed. The registers W4 and W6 are set to zero, whereas in the registers W8 and W10 are written the addresses of the first elements of the arrays Sgn and arr, respectively.

CORCON:=$00F5;
asm
  mov #@_adr2, W8
  mov [W8], W8
  mov #@_arr, W10
  mov #0,W4
  mov #0, W6

The initial value of the accumulator is set to zero by the instruction clr A. After that, the computing may start.

clr A

The repeat loop is used and it is executed 15 times. By performing mac instruction in the W4 and W6 registers the first elements of the arrays are written and then the 14 partial sums are calculated. The consequence of enabling modulo addressing is that the elements of the array Sgn 1,2,1,2,... will be read alternately.

repeat #14
mac W4*W6, A, [W8]+=2, W4, [W10]+=2, W6

After the loop is completed, the result is in the lowest 16 bits of the accumulator A. This result can be read directly from the address 0x0028. Another approach is used in the example in order to illustrate the use of instructions sftac and sac. These instructions are described in Table 11-2. At first, by using instruction sftac, the result is shifted to the middle 16 bits of the accumulator A. Then, by instruction sac, the result is written to the W1 register and from there forwarded to the port D.

sftac A, # - 16
  sac A, #0, W1
end;
LATD:=W1;

NOTE: Instruction SAC reads the result from AccAH (see Fig. 11-4).

Example 3 – (calculation of mathematical expectation of an array)

In the example it is shown how, by using instruction add, one can select one of the accumulators as a destination and how to use the instruction div for dividing two signed integers.

The instruction divide exists in the compiler and its use is very simple. However, the most efficient use of the DSP module is by using the assembler, so the purpose of this example and of other examples in this chapter is familiarization with the assembler instructions.

For the calculation of mathematical expectation of an array in this example, a procedure is used. The expression for calculating mathematical expectation is:

Expression for calculating mathematical expectation

where N is the number of elements in the array and R mathematical expectation.

{dsPIC30F6014A}

program Mean;

var
  arr : array[15] of Integer;   // Array of 15 signed-integer elements
  i, MeanRes : Word;

procedure MeanVar(ptrArr : Word; Len : Word; ptrMean : Word);
begin
  CORCON := $00F1;            // Signed computing, saturation for both Acc, integer computing
  asm
    mov [W14-8], W10          // W10 := ptrArr
    mov [W14-10], W7          // W7 := Len
    sub W7, #1, W2            // W2 := Len-1

    clr A
    repeat W2
    add [W10++], #0, A        // A := sum(arr)
                              
    add W7, #1, A             // A := A + (Len/2) for div's lack of rounding ...
    sac.r A, #0, W3           // W3 := round(AccA)

    repeat #17                // 18 iterations of signed-divide. Result in W0
    div.s W3, W7              // W0 := sum(arr)/Len

    mov [W14-12], W4          // W4 := ptrMean
    mov W0, [W4]              // Mean := Mean(arr)
  end;
end;

begin
  TRISD := 0;                 // Configure PORTD as output
  
  MeanRes := 0;
  for i := 0 to 14 do
    arr[i]:=i;                // Init arr

  MeanVar(@arr, 15, @MeanRes);// call subroutine
                              
  LATD := MeanRes;            // Send result to LATD
end.

The main program is very simple. After setting port D as output and initializing the input array, the procedure for calculating mathematical expectation is called. The result is then sent to port D.

TRISD := 0;                 // Configure PORTD as output  
MeanRes := 0;
for i := 0 to 14 do
  arr[i]:=i;                // Init arr
MeanVar(@arr, 15, @MeanRes);// call subroutine                              
LATD := MeanRes;            // Send result to LATD

The procedure for calculating mathematical expectation has only 3 parameters. The first parameter is the address of the first array element (ptrArr = @arr). The second parameter is the number of array elements (Len = 15). The third parameter is the address of the variable where the result, i.e. mathematical expectation, should be written (ptrMean = @MeanRes).

The procedure consists of three parts:

  1. Summation of array elements
  2. Division of the result by the number of array elements
  3. Saving the result

In order to use the accumulator correctly, the operating conditions of the accumulator should be defined first. This is done by setting the corresponding bits in the CORCON register. Structure of the CORCON reguister is shown in Table 11-3.

CORCON:=$00F1;

For the signed computing (positive and negative numbers), enabled saturation logic for both accumulators, enabled saturation logic while writing to the data memory and integer computing the value 0x00F1 should be written to the CORCON register. Enabling the saturation logic for the accumulator B is superfluous, but it is inserted in the example to show that the saturation logic can be enabled for both accumulators simultaneously.

After the CORCON register is set, the address of ther first array element is written to the W10 register. The number of array elements is written to the W7 register.

mov [W14-8], W10
mov [W14-10], W7

Instruction add should be called as many times as there are elements in the array, i.e. the value in the W7 register should be decremented by 1. Since the number of elements will be required later for performing division, the decrementd value is saved in the W2 register. This is done by the instruction.

sub W7, #1, W2

Instruction add will be executed the required number of times. The result of each call is the partial sum which is added to the content of the accumulator A. After completion of the loop, the sum of all array elements is in the accumulator.

clr A
repeat W2
add [W10++], #0, A

To obtain mathematical expectation, the sum of all array elements should be divided by the number of array elements. During division it is not possible to round off the result to the nearest integer. For this reason to the sum of all array elements the value Len/2 is added first. This is the same as adding the value 0.5 to the result, but this is not possible in this case because of integer computing. Adding the value Len/2 is done by the instruction add W7, #1, A. This instruction adds the value of the register W7 shifted one position to the right, which is analogous to divide by two, to the current value in the accumulator A. After that, the value in the accumulator A is read and divided by the number of array elements. This is done by the instruction sac.r A, #0, W3. Instruction sac.r is described in Table 11-2.

add W7, #1, A
sac.r A, #0, W3

In the family of dsPIC30F devices there is no hardware division. Division is performed by 18 iterations each calling instruction div in the loop. The result will be saved in the W0 register and the remainder in the W1 register. The sum of all array elements is saved in the W3 register and the number of array elements in the W7 register. Therefore, the instruction div.s W3, W7 is called in the loop. After the loop is completed, the result is saved in the W0 register.

repeat #17
div.s W3, W7

Since the value of mathematical expectation is in the W0 register, it is necssary to write this value to the destination address (third parameter of the procedure). In this way the obtained value of mathematical expectation is forwared to the main program for further processing.

mov [W14-12], W4
mov W0, [W4]

previous chapter | table of contents | next chapter