The process of calculation of the sum of products comprises several steps:
The first element of the first array and the first element of the second array (a1, b1) are taken first.
The two values are then multiplied.
The result is put to the accumulator (and added to the value already saved in the accumulator) A=A+
The result of the multiplication of two elements is called partial sum.
The process is continued with the next two array elements, until the last two elements have been multiplied.
As can be noticed, the process starts by reading two array elements, the 1st of the first array and the 1st of the second array. Several details need to be considered. Firstly, the array elements being multiplied are from different arrays. This means that the values of these elements are not saved in the adjacent locations. Secondly, the elements always have identical indices, i.e. the address changes are alway identical. E.g. if the arrays consist of 16-bit elements, after the calculation of each partial sum, it is required that the address for the next two elements that is increased or decreased by 2, depending whether the array is saved with the increasing or decreasing index. The change for such arrays will always be ±2. This is important because it allows this change to be caried out by hardware!
Reading the array elements to be multiplied can be accelerated by accessing simultaneously both memory locations. For this purpose it is required that the microcontroller has two data buses, two address generators and that the structure of the memory allows simultaneous access to different locations. The devices of the dsPIC30F family meet these requirements. The data buses are called X and Y bus, as shown in Fig. 11-1. There are two address generators (it is required to calculate simultaneously both addresses to be read). A multiple access to the memory is provided.
The X and Y data buses are considered here. To facilitate the realization, some constraints have to be introduced. Data memory where the elements of the arrays are saved has been split in two sections, X and Y data spaces, as shown in Fig. 11-2.
Fig. 11-2 Organizations of data memories of dsPIC30F4013 and dsPIC30F6014A devices
Fig. 11-2 shows the example of data memory organization of a dsPIC30F4013 device. The memory capacity, i.e. the size of the X and Y data spaces are device specific. The space for the special function registers (SFR) remains the same and so does the starting address of the X space.
Splitting data memory to the X and Y data spaces introduces the constraint that each data bus can have access only to one of the spaces. An attempt to access the other space (e.g. an attempt to access X data space via Y data bus) will result in generation of a trap or, if the procedure for processing a trap has not been specified, in device reset.
The existence of the constraints, when using the DSP instructions has aready been mentioned. The principal contraints when reading data are:
The X data bus has access only to X space in the data memory, or extended X space, which will be discussed later,
The Y data bus has access only to Y space, which has no extention as the X space,
The values to be sent to the multiplier for processing have to be in the general purpose registers W4...W7, specifically W4 and W5 for X space and W6 and W7 for Y space. The general purpose registers for the address pointers indicating the array elements to be read are W8...W11, specifically W8 and W9 for X space and W10 and W11 for Y space.
The practice is as follows:
Load the address of the next array element form the X space to the register W8 or W9.
Simultaneously, load the address of the next array element form the Y space to the register W10 or W11.
Then, simultanmeously read both array elements and load the corresponding values to the W4 or W5 register fot the X array and to W6 or W7 register for the Y array.
Values of the registers W8/9 and W10/11 are automatically incremented by 2 (if the array elements are 16-bit quantities).
Of course, all this carried out by the hardware. The code should only provide data concerning address increment (increase/decrease) when calculating the partial sums, the initial and final addresses of the arrays, the registers used for loading the addresses and reading the array elements.
The most frequently used DSP instruction is MAC. The following example shows one of the forms of its use.
MAC W4*W6, A, [W8]+=2, W4, [W10]+=2, W6
The instruction from this example:
Multiplies the values in the registers W4 and W6,
The result of the mulitplication adds to the accumulator A,
From the address in the X space pointed by the register W8 loads the value of the next element to the register W4,
After reading the array element in the X space, increments the value of the register W8 to point at the location of the next element of the X array,
From the address in the Y space pointed by the register W10 loads the value of the next element to the register W6,
After reading the array element in the Y space, increments the value of the register W10 to point at the location of the next element of the Y array.
The hardware specialized for DSP instructions allows that all this is executed in one instruction cycle! As can be seen, the DSP module makes the devices of the dsPIC30F family very powerful. If the device clock is 80MHz, the instruction clock is 20MHz, i.e. in one second 20 milion MAC instructions can be executed each including all six actions listed above!