Hello there,
I am on my way to get the fft working the way I need it.
That means I need higher frequency-resolution as well as a higher samplingrate.
The samplingrate was no problem, now I am working on the improvement of the frequency-resolution.
As far as I know there are 4096byte in the x-data-ram for the twiddlefactor-lookup-table. The given function needs only N/2 real and N/2 imag twiddlefactors, so I assume that the maximum possible number of bins is 4096.
Now I have the following questions:
1) Am I right?
2) Is there a possibility for further improvement? Do you know how much time it takes to calculate the twiddlefactors "on the fly" as well as calculate the fft? Just round about!
3) I am interested in the fft()-function. Can I get the code somewhere? I guess its written in assembler
4) Does this function and the bitreverse function allow so many twiddlefactors?
Those were my questions, thanx a lot in advance for every helpful answer,
Marius
Is N=4096 maximum for dsPIC fft-function?
Re: Is N=4096 maximum for dsPIC fft-function?
Here is the FFT source. It is written in ASM and you can study the comments in order to figure out the answers to your questions.
As for the twiddle factors, you calculate them using the formula:
Which means you can use more than 4096.
Concerning the RAM available, it depends on the chip you use. There is no general answer, just have a look at the datasheet and see if you can fit the factors in RAM.
As for the twiddle factors, you calculate them using the formula:
Code: Select all
//** Twiddle factors are defined as **
//** exp(-j*2*pi*k/N) **
//** k = 0, 1, ..., N-1 **
//** Twiddle factors are represented in 1.15 radix point format (fractional)
//** in sequence Re, Im, Re, Im...
Concerning the RAM available, it depends on the chip you use. There is no general answer, just have a look at the datasheet and see if you can fit the factors in RAM.
Code: Select all
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// This Complex DIF FFT implementation expects that the input data is a
// complex vector such that the magnitude of the real and imaginary parts
// of each of its elements is less than 0.5. If greater or equal to this
// value the results could produce saturation.
//
// Also, the program performs an implicit scaling of 1/2 for every stage
// to the intermediate values, so that the output values are scaled by a
// factor of 1/N, with N the length of the FFT.
//
// NOTE: input is expected in natural ordering, while output is produced
// in bit reverse ordering.
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
unit __Lib_Fft;
uses __Lib_utils, __Lib_TwiddleFactors;
implementation
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// _FFTComplexIP: Complex (in-place) DIF FFT.
//
// Operation:
// F(k) = 1/N*sum_n (f(n)*WN(kn)), WN(kn) = exp[-(j*2*pi*k*n)/N],
//
// n in {0, 1,... , N-1}, and
// k in {0, 1,... , N-1}, with N = 2^m.
//
// Input:
// w0 = number stages in FFT (log2N)
// w1 = ptr to complex source vector (srcCV)
// w2 = ptr to complex twiddle factors (twidFactors)
// w3 = COEFFS_IN_DATA, or memory program page with twiddle factors.
// Return:
// w0 = ptr to source vector (srcCV)
//
// System resources usage:
// {w0..w7} used, not restored
// {w8..w13} saved, used, restored
// AccuA used, not restored
// AccuB used, not restored
// CORCON saved, used, restored
// PSVPAG saved, used, restored (if factors in P memory)
//
// DO and REPEAT instruction usage.
// 2 level DO intruction
// no REPEAT intructions
//
// Program words (24-bit instructions):
// 60
//
// Cycles (including C-function call and return overheads):
//
// transform factors factors
// size in X-mem in P-mem
// -----------------------------------------
// 32-point 1664 1827
// 64-point 3802 4189
// 128-point 8612 9511
// 256-point 19310 21361
// 512-point 42872 47483
//
//............................................................................
// Local equates.
// Commented to avoid IAR assembler problems...
// .equ oTwidd,w3
// .equ pTwidd,w8
// .equ nGrps,w9
// .equ pUpper,w10
// .equ pLower,w11
// .equ groupCntr,w12
// .equ buttCntr,w13
// The symbols have been replaced in the code by the corresponding
// working registers// the comments maintain the symbols for clarity.
procedure FFT(log2N: word; TwiddleFactorsAddress: LongInt; var Samples: array[1024] of word);
var //adr: LongInt;
adr2: word; volatile;
begin
asm
push CORCON
push PSVPAG
end;
adr2 := TwiddleFactorsAddress and $FFFF;
CORCON := $F4;
asm
_FFTComplexIP:
//............................................................................
// Save working registers.
//push.d w8 // {w8,w9} to TOS
//push.d w10 // {w10,w11} to TOS
//push.d w12 // {w12,w13} to TOS
//............................................................................
//............................................................................
// move params to proper registers
mov [w14-8], w0 // number of samples log2N
mov [w14-14], w1 // ptr to source samples
mov [w14], w2 // ptr to lower word of twiddle factors' address
push w1 // save return value (srcCV)
// FFT proper.
mov #0x1,w9 // to be shifted...
sl w9,w0,w9 // w9 = N (1<<log2N)
mov #0x1,w3 // initialize twidds offset,
// also used as num butterflies
mov w2,w8 // w8-> WN(0) (real part)
// Preform all k stages, k = 1:log2N.
// NOTE: for first stage, 1 butterfly per twiddle factor (w3 = 1)
// and N/2 groups (w9 = N/2) for factors WN(0), WN(1), ..., WN(N/2-1).
mov w0,w13 // w13= num stages
// w0 available for reuse
_FFT_L02:
// Perform all butterflies in each stage, grouped per twiddle factor.
// Update counter for groups.
lsr w9,w9 // w9 = N/(2^k)
sl w9,#2,w12 // w12= lower offset
// nGrps+sizeof(fractcomplex)
// *2 bytes per element
// Set pointers to upper "leg" of butterfly:
mov w1,w10 // w10-> srcCV (upper leg)
// Perform all the butterflies in each stage.
dec w3,w4 // w4 = butterflies per group
do w4, _FFT_L04 // do 2^(k-1) butterflies
// Set pointer to lower "leg" of butterfly.
add w12,w10,w11 // w11-> srcCV + lower offset
// (lower leg)
// Prepare offset for twiddle factors.
sl w3,#2,w0 // oTwidd*sizeof(fractcomplex)
// *2 bytes per element
// Perform each group of butterflies, one for each twiddle factor.
dec w9,w4 // w4 = nGrps-1
do w4,_FFT_L03 // do (nGrps-1)+1 times
// no more nested do's...
// Butterfly proper.
clr a,[w8]+=2,w6,[w10]+=2,w4 // a = 0 (for dual fetch only)
// w6 = Wr
// pTwidd-> Wi
// w4 = Ar
// pUpper-> Ai (=Ci)
sub w4,[w11++],w4 // w4 = Ar - Br
// pLower-> Bi (=Di)
mpy w4*w6,a,[w8]-=2,w7,[w10]-=2,w5 // a = (Ar - Br)*Wr
// w7 = Wi
// pTwidd-> Wr
// w5 = Ai
// pUpper-> Ar (=Cr)
// Update twiddle pointer.
add w0,w8,w8 // pTwidd-> for next group
mpy w4*w7,b // b = (Ar - Br)*Wi
sub w5,[w11--],w5 // w5 = Ai - Bi
// pLower-> Br (=Dr)
msc w5*w7,a,[w11]+=2,w7 // a = (Ar - Br)*Wr
// - (Ai - Bi)*Wi = Dr
// w7 = Br
// pLower-> Bi (=Di)
mac w5*w6,b,[w11]-=2,w5 // b = (Ar - Br)*Wi
// + (Ai - Bi)*Wr = Di
// w5 = Bi
// pLower-> Br (=Dr)
sac.r a,#1,[w11++] // save 1/2*Dr
// pLower-> Bi (=Di)
sac.r b,#1,[w11++] // save 1/2*Di
// pLower-> Br (next)
lac [w10++],a // a = Ar
// pUpper-> Ai (=Ci)
lac [w10--],b // b = Ai
// pUpper-> Cr
add w7, a // a = Ar + Br
add w5, b // b = Ai + Bi
sac.r a, #1, [w10++] // save 1/2*Cr
// pUpper-> Ci
_FFT_L03:
sac.r b,#1,[w10++] // save 1/2*Ci
// pUpper-> Ar (next)
add w12,w10,w10 // w10-> upper leg (next set)
_FFT_L04:
mov w2,w8 // rewind twiddle pointer
// Update offset to factors.
sl w3,w3 // oTwidd *= 2
// Find out whether to perform another stage...
dec w13,w13 // w13= log2N - k -1
bra gt,_FFT_L02 // if W13 > 0, do next stage
_FFT_L05: // else, no more stages...
//............................................................................
pop w0 // restore return value
//............................................................................
// Restore PSVPAG and CORCON.
pop PSVPAG
pop CORCON
//............................................................................
// Restore working registers.
//pop.d w12 // {w12,w13} from TOS
//pop.d w10 // {w10,w11} from TOS
//pop.d w8 // {w8,w9} from TOS
//............................................................................
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
end; //asm
end;
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// OEF
end.
Hello,
I think there was a misunderstanding, I already knew that it is possible to use more than 4096 factors, N is always a power of two (radix-2 algorithm) so it can be much higher at all ;-)
After I read parts of the datasheet and your posted source I am a bit further.
I am using dsPIC30F6014A as controller with 4096 byte of X- and Y-dataspace [W8-W11] which your source is using and which is recommended for fast execution with the dsp MAC-instructions.
One fractional needs two bytes, so the maximum number of fractionals is 2048. Because I need the twiddlefactors in real and imaginary parts, there are at most 1024 factors possible.
My answer is, that the function is able to calculate FFT's with higher resolution, but I havent got enough dataspace for my samples and factors :-(
But is it possible to get higher resolution at the expense of execution time?
My application is'nt this time critical...(calculation may take up to 1 sec.)
Or do I have to take another dsp?
Thank you
So highest possible resolution with your routine is fa/N
I think there was a misunderstanding, I already knew that it is possible to use more than 4096 factors, N is always a power of two (radix-2 algorithm) so it can be much higher at all ;-)
After I read parts of the datasheet and your posted source I am a bit further.
I am using dsPIC30F6014A as controller with 4096 byte of X- and Y-dataspace [W8-W11] which your source is using and which is recommended for fast execution with the dsp MAC-instructions.
One fractional needs two bytes, so the maximum number of fractionals is 2048. Because I need the twiddlefactors in real and imaginary parts, there are at most 1024 factors possible.
My answer is, that the function is able to calculate FFT's with higher resolution, but I havent got enough dataspace for my samples and factors :-(
But is it possible to get higher resolution at the expense of execution time?
My application is'nt this time critical...(calculation may take up to 1 sec.)
Or do I have to take another dsp?
Thank you
So highest possible resolution with your routine is fa/N
You figured it out correctly. The answer is that you have to find the way to store more twiddle factors. What I would do would be to use a mass storage media (MMC) to store the factors, then I would transfer 1024 packets of the factors to dataspace. Finally, you have to modify the source I gave to you to make all of this possible.triplex wrote:But is it possible to get higher resolution at the expense of execution time?
My application is'nt this time critical...(calculation may take up to 1 sec.)
Or do I have to take another dsp?
Or get a bigger chip.
I tested FFT library starting from theoretical data and all works fine.
Like Triplex, I need more input data to increase the precision.
I use 9 stages and 512 values but some other variables make that the memory used is 54%. I cannot use 1024 values to occupy the totality of Y data-space.
Does somebody have a solution to move the space of the additional variables?
Daniel
Like Triplex, I need more input data to increase the precision.
I use 9 stages and 512 values but some other variables make that the memory used is 54%. I cannot use 1024 values to occupy the totality of Y data-space.
Does somebody have a solution to move the space of the additional variables?
Daniel
Daniel