Is N=4096 maximum for dsPIC fft-function?

General discussion on mikroC for dsPIC30/33 and PIC24.
Post Reply
Author
Message
triplex
Posts: 10
Joined: 02 Apr 2008 09:50
Location: germany

Is N=4096 maximum for dsPIC fft-function?

#1 Post by triplex » 19 May 2008 17:40

Hello there,
I am on my way to get the fft working the way I need it.
That means I need higher frequency-resolution as well as a higher samplingrate.

The samplingrate was no problem, now I am working on the improvement of the frequency-resolution.

As far as I know there are 4096byte in the x-data-ram for the twiddlefactor-lookup-table. The given function needs only N/2 real and N/2 imag twiddlefactors, so I assume that the maximum possible number of bins is 4096.


Now I have the following questions:

1) Am I right?

2) Is there a possibility for further improvement? Do you know how much time it takes to calculate the twiddlefactors "on the fly" as well as calculate the fft? Just round about!

3) I am interested in the fft()-function. Can I get the code somewhere? I guess its written in assembler :-)

4) Does this function and the bitreverse function allow so many twiddlefactors?

Those were my questions, thanx a lot in advance for every helpful answer,

Marius

User avatar
zristic
mikroElektronika team
Posts: 6608
Joined: 03 Aug 2004 12:59
Contact:

Re: Is N=4096 maximum for dsPIC fft-function?

#2 Post by zristic » 20 May 2008 09:24

Here is the FFT source. It is written in ASM and you can study the comments in order to figure out the answers to your questions.

As for the twiddle factors, you calculate them using the formula:

Code: Select all

//**      Twiddle factors are defined as                                      **
//**        exp(-j*2*pi*k/N)                                                  **
//**        k = 0, 1, ..., N-1                                                **
//**      Twiddle factors are represented in 1.15 radix point format (fractional)
//**        in sequence Re, Im, Re, Im...    
Which means you can use more than 4096.

Concerning the RAM available, it depends on the chip you use. There is no general answer, just have a look at the datasheet and see if you can fit the factors in RAM.

Code: Select all

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// This Complex DIF FFT implementation expects that the input data is a
// complex vector such that the magnitude of the real and imaginary parts
// of each of its elements is less than 0.5. If greater or equal to this
// value the results could produce saturation.
//
// Also, the program performs an implicit scaling of 1/2 for every stage
// to the intermediate values, so that the output values are scaled by a
// factor of 1/N, with N the length of the FFT.
//
// NOTE: input is expected in natural ordering, while output is produced
// in bit reverse ordering.
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
unit __Lib_Fft;

uses __Lib_utils, __Lib_TwiddleFactors;

implementation

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// _FFTComplexIP: Complex (in-place) DIF FFT.
//
// Operation:
//	F(k) = 1/N*sum_n (f(n)*WN(kn)), WN(kn) = exp[-(j*2*pi*k*n)/N],
//
// n in {0, 1,... , N-1}, and
// k in {0, 1,... , N-1}, with N = 2^m.
//
// Input:
//	w0 = number stages in FFT (log2N)
//	w1 = ptr to complex source vector (srcCV)
//	w2 = ptr to complex twiddle factors (twidFactors)
//	w3 = COEFFS_IN_DATA, or memory program page with twiddle factors.
// Return:
//	w0 = ptr to source vector (srcCV)
//
// System resources usage:
//	{w0..w7}	used, not restored
//	{w8..w13}	saved, used, restored
//	 AccuA		used, not restored
//	 AccuB		used, not restored
//	 CORCON		saved, used, restored
//	 PSVPAG		saved, used, restored (if factors in P memory)
//
// DO and REPEAT instruction usage.
//	2 level DO intruction
//	no REPEAT intructions
//
// Program words (24-bit instructions):
//	60
//
// Cycles (including C-function call and return overheads):
//
//	transform	factors		factors
//	   size		in X-mem	in P-mem
//	-----------------------------------------
//	 32-point	   1664		   1827
//	 64-point	   3802		   4189
//	128-point	   8612		   9511
//	256-point	  19310		  21361
//   512-point     42872       47483
//
//............................................................................

	// Local equates.
	// Commented to avoid IAR assembler problems...
//	.equ	oTwidd,w3
//	.equ	pTwidd,w8
//	.equ	nGrps,w9
//	.equ	pUpper,w10
//	.equ	pLower,w11
//	.equ	groupCntr,w12
//	.equ	buttCntr,w13
	// The symbols have been replaced in the code by the corresponding
	// working registers// the comments maintain the symbols for clarity.

procedure FFT(log2N: word; TwiddleFactorsAddress: LongInt; var Samples: array[1024] of word);
var //adr: LongInt;
    adr2: word; volatile;
begin
  asm
    push  CORCON
    push	PSVPAG
  end;

  adr2 := TwiddleFactorsAddress and $FFFF;
  CORCON := $F4;
  asm
    _FFTComplexIP:

    //............................................................................

    	// Save working registers.
      //push.d	w8				// {w8,w9} to TOS
      //push.d	w10				// {w10,w11} to TOS
      //push.d	w12				// {w12,w13} to TOS

    //............................................................................


    //............................................................................


      // move params to proper registers
      mov  [w14-8], w0   // number of samples log2N
      mov  [w14-14],  w1 // ptr to source samples
      mov  [w14],  w2    // ptr to lower word of twiddle factors' address

     push	w1				     // save return value (srcCV)

    	// FFT proper.
    	mov	#0x1,w9				 // to be shifted...
    	sl	w9,w0,w9			 // w9 = N (1<<log2N)
    	mov	#0x1,w3				 // initialize twidds offset,
    						         // also used as num butterflies
    	mov	w2,w8				   // w8-> WN(0) (real part)

    	// Preform all k stages, k = 1:log2N.
    	// NOTE: for first stage, 1 butterfly per twiddle factor (w3 = 1)
    	// and N/2 groups  (w9 = N/2) for factors WN(0), WN(1), ..., WN(N/2-1).
    	mov	w0,w13				 // w13= num stages
    						         // w0 available for reuse
    _FFT_L02:

    	// Perform all butterflies in each stage, grouped per twiddle factor.

    	// Update counter for groups.
    	lsr	w9,w9				   // w9 = N/(2^k)
    	sl	w9,#2,w12			 // w12= lower offset
    						         // nGrps+sizeof(fractcomplex)
    						         // *2 bytes per element

    	// Set pointers to upper "leg" of butterfly:
    	mov	w1,w10				 // w10-> srcCV (upper leg)

    	// Perform all the butterflies in each stage.
    	dec	w3,w4				   // w4 = butterflies per group
    	do	w4, _FFT_L04	 // do 2^(k-1) butterflies

    	// Set pointer to lower "leg" of butterfly.
    	add	w12,w10,w11		 // w11-> srcCV + lower offset
    						         // (lower leg)
    	// Prepare offset for twiddle factors.
    	sl	w3,#2,w0			 // oTwidd*sizeof(fractcomplex)
    						         // *2 bytes per element

    	// Perform each group of butterflies, one for each twiddle factor.
    	dec	w9,w4				   // w4 = nGrps-1
    	do	w4,_FFT_L03		 // do (nGrps-1)+1 times
    						         // no more nested do's...
    	// Butterfly proper.
    	clr	a,[w8]+=2,w6,[w10]+=2,w4	// a  = 0 (for dual fetch only)
    						                    // w6 = Wr
    						                    // pTwidd-> Wi
    						                    // w4 = Ar
    						                    // pUpper-> Ai (=Ci)
    	sub	w4,[w11++],w4	 // w4 = Ar - Br
    						         // pLower-> Bi (=Di)
    	mpy	w4*w6,a,[w8]-=2,w7,[w10]-=2,w5	// a  = (Ar - Br)*Wr
    						                          // w7 = Wi
    						                          // pTwidd-> Wr
    						                          // w5 = Ai
    						                          // pUpper-> Ar (=Cr)
    	// Update twiddle pointer.
    	add	w0,w8,w8			 // pTwidd-> for next group

    	mpy	w4*w7,b				 // b  = (Ar - Br)*Wi
    	sub	w5,[w11--],w5	 // w5 = Ai - Bi
    						         // pLower-> Br (=Dr)
    	msc	w5*w7,a,[w11]+=2,w7		// a  = (Ar - Br)*Wr
    						                //    - (Ai - Bi)*Wi = Dr
    						                // w7 = Br
    						                // pLower-> Bi (=Di)
    	mac	w5*w6,b,[w11]-=2,w5		// b  = (Ar - Br)*Wi
    						                //    + (Ai - Bi)*Wr = Di
    						                // w5 = Bi
    						                // pLower-> Br (=Dr)
    	sac.r	a,#1,[w11++]			  // save 1/2*Dr
    						                // pLower-> Bi (=Di)
    	sac.r	b,#1,[w11++]			  // save 1/2*Di
    						                // pLower-> Br (next)
    	lac	[w10++],a			        // a  = Ar
    						                // pUpper-> Ai (=Ci)
    	lac	[w10--],b			 // b  = Ai
    						         // pUpper-> Cr
    	add	w7, a				   // a  = Ar + Br
    	add	w5, b				   // b  = Ai + Bi
    	sac.r	a, #1, [w10++]			// save 1/2*Cr
    						                // pUpper-> Ci
    _FFT_L03:
    	sac.r	b,#1,[w10++]			  // save 1/2*Ci
                             		// pUpper-> Ar (next)

    	add	w12,w10,w10		 // w10-> upper leg (next set)
   _FFT_L04:
    	mov	w2,w8				   // rewind twiddle pointer

    	// Update offset to factors.
    	sl	w3,w3				   // oTwidd *= 2

    	// Find out whether to perform another stage...
    	dec	w13,w13				 // w13= log2N - k -1
    	bra	gt,_FFT_L02		 // if W13 > 0, do next stage
    _FFT_L05:					   // else, no more stages...

    //............................................................................

    	pop	w0				     // restore return value


    //............................................................................

    	// Restore PSVPAG and CORCON.
    	pop	PSVPAG
    	pop	CORCON

    //............................................................................

    	// Restore working registers.
    	//pop.d	w12				// {w12,w13} from TOS
    	//pop.d	w10				// {w10,w11} from TOS
    	//pop.d	w8				// {w8,w9} from TOS

    //............................................................................

    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  end; //asm

end;

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// OEF


end.

triplex
Posts: 10
Joined: 02 Apr 2008 09:50
Location: germany

#3 Post by triplex » 20 May 2008 09:28

Thank you very much so far!

triplex
Posts: 10
Joined: 02 Apr 2008 09:50
Location: germany

#4 Post by triplex » 21 May 2008 07:23

Hello,

I think there was a misunderstanding, I already knew that it is possible to use more than 4096 factors, N is always a power of two (radix-2 algorithm) so it can be much higher at all ;-)

After I read parts of the datasheet and your posted source I am a bit further.
I am using dsPIC30F6014A as controller with 4096 byte of X- and Y-dataspace [W8-W11] which your source is using and which is recommended for fast execution with the dsp MAC-instructions.

One fractional needs two bytes, so the maximum number of fractionals is 2048. Because I need the twiddlefactors in real and imaginary parts, there are at most 1024 factors possible.

My answer is, that the function is able to calculate FFT's with higher resolution, but I havent got enough dataspace for my samples and factors :-(

But is it possible to get higher resolution at the expense of execution time?
My application is'nt this time critical...(calculation may take up to 1 sec.)
Or do I have to take another dsp?

Thank you


So highest possible resolution with your routine is fa/N

triplex
Posts: 10
Joined: 02 Apr 2008 09:50
Location: germany

#5 Post by triplex » 21 May 2008 07:25

Forget about the sentence after Thank you :-)

I know thats wrong!

User avatar
zristic
mikroElektronika team
Posts: 6608
Joined: 03 Aug 2004 12:59
Contact:

#6 Post by zristic » 21 May 2008 08:35

triplex wrote:But is it possible to get higher resolution at the expense of execution time?
My application is'nt this time critical...(calculation may take up to 1 sec.)
Or do I have to take another dsp?
You figured it out correctly. The answer is that you have to find the way to store more twiddle factors. What I would do would be to use a mass storage media (MMC) to store the factors, then I would transfer 1024 packets of the factors to dataspace. Finally, you have to modify the source I gave to you to make all of this possible.

Or get a bigger chip.

idakota
Posts: 334
Joined: 27 Sep 2006 08:07
Location: Pretoria/South Africa
Contact:

#7 Post by idakota » 21 May 2008 09:34

Anybody used the new TI C6455 DSPs? Used it in my DSP course that I just finished. What a chip :D Up to 1.2GHz/8000MIPS!

ddpt
Posts: 14
Joined: 02 Jul 2007 09:37
Location: South France

#8 Post by ddpt » 01 Oct 2008 16:31

I tested FFT library starting from theoretical data and all works fine.
Like Triplex, I need more input data to increase the precision.
I use 9 stages and 512 values but some other variables make that the memory used is 54%. I cannot use 1024 values to occupy the totality of Y data-space.
Does somebody have a solution to move the space of the additional variables?
Daniel
Daniel

Post Reply

Return to “mikroC for dsPIC30/33 and PIC24 General”