Dear MikroE,
I am working on a firmware that require an intensive use of square roots.
So i decided to use an stm32 with floating point. Then i have started to look at the efficency, because of the slow response of the application.
Here my question:
I have included in the project the library "Q31".
Then i did this simple test:
Basic (or C)
-------------
A1=9
A=sqrt(A1)
-------------
ASSEMBLER
------------------------
asm
VMOV.F32 S1,#9
VSQRT.F32 s1,s2
end asm
------------------------
Here the lister output for the assembly written code:
VMOV.F32 S1, #9
VSQRT.F32 S1, S2
Execution time @1Mhz
VMOV.F32 S1, #9 ---> 1uS (1 Cycle)
VSQRT.F32 S1,S2 ----> 14uS (14 Cycles)
Here the lister output for the Basic( or C ) written code:
VMOV.F32 S0, #9 ---> 1uS (1 Cycle)
BL _sqrt+0 -------> 369 uS ( 369 Cycles)
If you analyze the listing output of the routine (_sqrt+0) you will see an indirect calculation of the square root instead of the use of the OP-code implemented in the FPU!!
Is it a bug?
Let me know,
Alessandro
MikroBasic Efficency
Re: MikroBasic Efficency
Hi,
Which MCU are you using ?
Regards,
Filip.
Which MCU are you using ?
Regards,
Filip.
-
- Posts: 101
- Joined: 04 Nov 2016 13:09
Re: MikroBasic Efficency
Hi Filip,
I am working MikroBasic 5.1.0 on Windows 10.
And i am doing this test switching between a STM32F071V8 and a STM32L476VE/STM32F401VB.
Basically, i am switching between an M0 and an M4.
Here another example:
------------
Basic (or C)
------------
dim A1,A2 as float
A1=A1*A2
Totally 11 Cycles
The compiled output use the VMUL.F32 OP-Code.
-----------
Assembler:
-----------
asm
VMOV.F32 S2,#1.5
VMOV.F32 S1,#1.5
VMUL.F32 S2,S1
end asm
Totally 3 Cycles
----------------------------------------
I got same good results with VDIV.F32 VADD.F32 VCMP.F32 VCMPE.F32
Here there is a good efficency.
In the Basic program used as test, is included the transfer between the CPU and FPU, so the assembler implementation seems to be good.
I am looking at some other inconsistencies...i will report them in the next days.
Thank you very much for your help!
Alessandro
I am working MikroBasic 5.1.0 on Windows 10.
And i am doing this test switching between a STM32F071V8 and a STM32L476VE/STM32F401VB.
Basically, i am switching between an M0 and an M4.
Here another example:
------------
Basic (or C)
------------
dim A1,A2 as float
A1=A1*A2
Totally 11 Cycles
The compiled output use the VMUL.F32 OP-Code.
-----------
Assembler:
-----------
asm
VMOV.F32 S2,#1.5
VMOV.F32 S1,#1.5
VMUL.F32 S2,S1
end asm
Totally 3 Cycles
----------------------------------------
I got same good results with VDIV.F32 VADD.F32 VCMP.F32 VCMPE.F32
Here there is a good efficency.
In the Basic program used as test, is included the transfer between the CPU and FPU, so the assembler implementation seems to be good.
I am looking at some other inconsistencies...i will report them in the next days.
Thank you very much for your help!
Alessandro
Re: MikroBasic Efficency
Hi,
Thank you for this explanation, I will pass it to our developers.
Regards,
Filip.
Thank you for this explanation, I will pass it to our developers.
Regards,
Filip.