Tips for optimising code size

Message

deandob · #1 Post by **deandob** » 17 Mar 2010 22:50

All,

My first post here, firstly let me thank the MikroElektronika team for producing a great product, especially for making it easy for people with Visual Basic experience, the transition to MikroBasic has been painless, and I also appreciate the Visual Studio like IDE. I was programming PICs in ASM about 8 years back on a hobby project and found it fairly hard for a beginner specifically for the learning curve but did manage to get some fairly complex coding done. Now I'm coming back to the PIC for another project and my productivity is considerably faster with the MikroElektronika tools. I also appreciate that the compiler/IDE is free for code under 2K which makes it painless to use MikroElektronika for hobby work, as you can do a hell of a lot in 2K, especially if you optimise your code (the topic of this post!).

My question is around how to optimise my code for ROM size. I made a bulk purchase of 16F628 chips with the 2K ROM limit for a hobby project (home automation) which I thought would be sufficient for my needs but like all good projects the functionality scope has blown out

, and I am running short of code space. I have noticed comparing my old ASM code to the ASM output of MikroBasic that there is an efficiency difference, but its not huge and its to be expected. It looks like out of the 2K of ROM by the time I finish debugging I will end up about 100 instructions too many, and I expect I should be able to make the final code fit with hand optimisation.

However I have spent a couple of hours searching this forum looking for optimisation tips and surprised that there is not more discussions or even a FAQ / sticky on the best tips for optimisation - possibly because the standard answer is buy a larger PIC, which is not always the right answer. Most posts are about optimising a particular code snippet, I could not locate a post with a collection of more general tips (hence I thought I'd start this topic).

Another consideration is that a smaller code base most of the time equals faster running code, so the benefits are not just using less ROM, although you may end up with code less obvious/readable as a downside (for maintenance). It was also surprising that a few of the techniques I had found with experimentation with the compiler have not been posted, so I thought I would share my findings, collect other tips from this community as well as ask for optimisation help with a few of my routines that I am stuck on optimising.

I see member 'xor' has been active with helping people with optimising, so I'm hoping he will chime in, or at least provide links to similar topics that I may have missed in my seach (although I have spent a couple of hours searching / reading!). Some of the below tips are obvious (to experienced folks) but still useful reminders.

So for my tips:
1) Less lines of Basic does not always mean tighter ASM code. I have noted that when I combine functions on a single line that about 50% of the time the compiler uses less instructions if the expression is decomposed into separate lines. I have noted a couple of examples below, but as a general rule I will try a more complex expression split over several simpler lines and check to see the ROM difference with a build (usually quick to try).

2) Use simple types. As soon as you start using more complex types (eg. words), the compiler has to split your variable into different registers and shuffle the registers around etc.

3) Always do integer arithmetic. Any floating point or more complex mathematic expressions will always blow the code out. You may need to make approximations in your code and live with a margin of error, but if accuracy is not important you will have smaller (and faster) code. Use addition / subtraction over multiplication / division if possible.

4) If you can multiply or divide by a power of 2 (eg. divide by 2), you can save a bit of code by using the >> or << operator instead of using division. Sometimes the compiler will optimise for shifts but I have seen a couple of scenarios where it didn't.

5) Do not put multiple condition tests in the same IF statement. I'm not sure why the compiler does not optimise this well, but a statement like:
IF (Rx_Buffer[Data0] = DataSize) AND (RxPtr <> 0) THEN ....
can have the AND condition split into 2 lines, which is more lines of Basic but less ASM code, like:
IF Rx_Buffer[Data0] = DataSize THEN
IF RxPtr <> 0 THEN .....
For OR conditions, use a local boolean variable before the IF/THEN test and assign your boolean to the logic test, then test the boolean in the IF/THEN (although I have not been able to save any space doing this). The next version of the compiler with SSA optimisations will be smarter about optimising more complex IF/THEN constructs.

6) It does help learning ASM and writing ASM as you will get a better appreciation of how the 'machine' runs. If you can keep your Basic code to as close an approximation to how it would work in ASM the code is always tighter. All the high level programming constructs come at a cost, more code & slower, as the compiler has to decompose your code down to ASM anyway.

7) A good example of the above is use of boolean operators. You can do some pretty fancy integer operations/tests with boolean logic although sometimes it takes a bit of thinking/experimentation to get it right (which I would never bother with in Visual Basic as it just didn't matter). An example is using XOR to test for a change in bits (eg. a port), you can save many instructions compared to using a higher language construct like looping through a variable's bits like 'variableX.Loop'.

8 ) Replace your Basic code with more optimised inline ASM. This needs a bit of extra work and the code is harder to read but it is a good way to optimise code. Usually you will need to read the compiler ASM output listing and work out a more efficient way in ASM, and replace the Basic code with inline ASM. Example below.

9) When doing multiple shifts << or >>, or power of 2 division/multiplication, you can optimise the ASM by removing the clearing of the carry bit each time a shift happens. The compiler can't tell what you are doing, so plays safe by assuming you don't want the carry (which most times you don't). You can AND out the redundant carry bits at the end. For example when processing nibbles:
Variable >> 4
compiles to:
MOVF FARG_LCDWriteByte_LCDByte+0, 0
MOVWF FARG_SetOutputs_OutVal+0
RRF FARG_SetOutputs_OutVal+0, 1
BCF FARG_SetOutputs_OutVal+0, 7
RRF FARG_SetOutputs_OutVal+0, 1
BCF FARG_SetOutputs_OutVal+0, 7
RRF FARG_SetOutputs_OutVal+0, 1
BCF FARG_SetOutputs_OutVal+0, 7
RRF FARG_SetOutputs_OutVal+0, 1
BCF FARG_SetOutputs_OutVal+0, 7
But could be replaced with:
MOVF FARG_LCDWriteByte_LCDByte+0, 0
MOVWF FARG_SetOutputs_OutVal+0
RRF FARG_SetOutputs_OutVal+0, 1
RRF FARG_SetOutputs_OutVal+0, 1
RRF FARG_SetOutputs_OutVal+0, 1
RRF FARG_SetOutputs_OutVal+0, 1
MOVLW 15 ' %00001111
ANDWF FARG_SetOutputs_OutVal+0, 1
(in this case use of the ASM SWAPF command would be even better)

10) Use of local variable scope over globals. I use a lot of state machines in my code, and globals are perfect especially as you usually want to keep your ISR processing to a minimum and save the heavy processing for the main loop. However I have found that over use of globals adds bloat sometimes to the code because the compiler has to perform bank switching when accessing PIC RAM in different locations. Using local variables also has the opportunity to reduce RAM use if the compiler can use an available register rather than RAM.

11) To reduce the amount of bank switching the compiler must do when accessing globals, try to group the globals that will be used together into the same memory bank by forcing the compiler to place these variables in the same RAM locations using the 'absolute' statement when declaring the variable. You will need to study your PIC datasheet to work out the relevant memory banks (eg. PIC 16F628 has 4 memory banks that are quite fragmented and shared with the registers). You can also hand optimise the code by replacing Basic with ASM and removing the unnecessary bank switching (be careful - the compiler assumes the banks are not changed to you have to restore the bank settings at the end of your ASM, else you can introduce nasty bugs). Its preferrable to keep the most commonly used variables in Bank0. Credit help for this tip - Janni.

12) Always view the ASM listing after adding/changing new code looking for large chunks of ASM code for small lines of Basic. Also always look at the changes in the number of instructions used in the ROM at the end of the compiler output when compiling. Be curious and try a couple of different ways of doing the same thing, I have found a number of times that a small change to the Basic expression can make a difference to the code used, and a lot of times the change was not obvious.

13) The MikroElectronika libraries are generally not code efficient. I have pretty much found I can beat them in efficiency if I write equivalent functions as I have the luxury of coding to my specific requirements, not for the general case as the libraries have to be (and note that this is not a fault of MikroElectronika, their libraries have to be written to the general case for multi-use). For example, replacing the UART init routine with equivalent Basic statements below saved a bit of space and was easy to do:
SPBRG = 129 ' 9600
TXSTA = %00100100 ' Async, 8 bit, enable Tx, High baud rate, clear transmit shift register, clear 9th bit
RCSTA = %10010000 ' 144 Enable serial port, enable continuous receive

14) Use of symbols. Not only do they increase the readability of the code, replacing a numeric constant with something more readable/maintanable, but they allow you to do any offline calculations that the PIC does not have to do at runtime (the compiler will calculate at compile time).

15) When one is pretty sure of manual optimisation, setting compiler optimisation level to 0 or less optimisation may lead to smaller final code. That's because higher settings switch on optimisation mechanism that's efficient for some type of groups of statements (mostly expressions), but less efficient for others (mostly the well written ones). Credit for this tip - Janni.

16) When optimising your code by re-writing certain lines/sections (eg. because the assembly listing shows them to be inefficient, usually with unnecessary bank switching), you may be surprised to see your optimisation actually use up more code space than before, typically because the compiler optimiser for whatever reason gets tricked up. You will see this in the asm listing usually with multiple bank (RP0/1) switching. Try changing your optimisation levels and re-arranging your code, and sometimes you are better to leave that section of code un-optimised.

17) For the PRO version of the compiler, its OK to use the compiler scratch variables R0 - Rx (see listing) directly in your listing for your temporary variables as these are always in bank0 and can end up saving ROM and RAM. However be careful with this, I have found some corruption when using the internal variables, probably because their scope is global & the compiler makes some assumptions, so only do this when you know you won't be overwriting variables in another routine (eg. OK to do for top level routines, not OK for interrupt routines etc.) Credit for this tip - Janni.

18) If you only use a function once, put it inline rather that in a separate function. Your code won't be as modular but you will save space by not having to save variables passed to the function & making the call/return.

19) Pass complex types by reference not by value (eg. strings) into subroutines. This saves the compiler having to create copies of your variable and passing it around. Do not make functions of which the returned value is of a complex type. Instead use a "byref" parameter and put the result in there. Credit for this tip - Dany.

20) For simple IF THEN ELSE statements, you can save a couple of ROM instructions by putting your ELSE statements before the IF condition test and remove the ELSE so the default condition is always executed and only if the test is true will the alternative conditions be run. This will only work for simple statements like changing a variable depending on the outcome of the condition test but this happens a lot in PIC code. For example:
if IR_DETECT_PIN = IR_CARRIER then ' For the odd counter, if the IR Port is high, we have a '1'
IRAddr.0 = 1
else
IRAddr.0 = 0
end if
Can become (saving 1 instruction):
IRAddr.0 = 0
if IR_DETECT_PIN = IR_CARRIER then ' For the odd counter, if the IR Port is high, we have a '1'
IRAddr.0 = 1
end if

21) Use of WHILE WEND instead of FOR NEXT loops. Depending on your loop construct, the WHILE WEND construct is closer to the native ASM bit test then skip condition tests and in some circumstances compiles to less ROM instructions. I have not found this to be consistent and it seems to work better in simpler loops.

22) Use procedures instead of functions if you have no need to return a variable. In higher level languages we are used to using procedures and passing back return values to the calling routine, even if we don't use the return information. In PICs, a lot of the time you don't have the space or 'horsepower' to do anything with the returned value (like error management), so use procedures instead to save 4 - 6 instructions.

As a beginner in MikroBasic I have only scratched the surface with the above, and I'm sure more experienced MikroBasic developers can add a lot more tips and even expand (and correct!) my tips above. I'll post my particular routines I'm having difficulty with in a later post in this thread, and I'll compile (pun intended

) in this post the aggregate tips that people post, as a reference for readers.

Please share your tips by replying to this thread!

Regards,
Dean

deandob · #2 Post by **deandob** » 26 Mar 2010 21:34

Looks like there is not much interest in discussing optimising code (80+ views and no one willing to share tips / experiences).

I have a specific question about optimisation hopefully someone can answer. Looking through the assembly listing I see a lot of extra code used to set the register banks back and forward, sometimes for no reason at all. Take the assembly snippit below, which is performing an AND on one variable and saving in another. In this code the register bank is switched twice, due to the variables being stored in different banks.

;Framework.mbas,765 :: TempByte1 = LCDByte AND %00001111
0x0113 0x300F MOVLW 15
0x0114 0x1683 BSF STATUS, 5
0x0115 0x1303 BCF STATUS, 6
0x0116 0x056F ANDWF FARG_LCDWriteByte_LCDByte, 0
0x0117 0x1283 BCF STATUS, 5
0x0118 0x1703 BSF STATUS, 6
0x0119 0x00A2 MOVWF FARG_SetOutputs_OutVal

Is there any way to have the compiler be smarter about the location of the variables so that the bank switching is not needed. I would say that 20% of my code is used up bank switching. Or is the only option to do hand profiling and work out what variables are used in each routine and use 'absolute' declaration to hard code the variable location in memory?

Please share your tips in this thread. Thanks.

janni · #3 Post by **janni** » 27 Mar 2010 16:50

Excellent work and I mean it

. Do not feel discuraged by lack of responses - not everybody has time to follow the forum every day, and you've written your compilation to be mostly read, didn't you?

You haven't spotted much optimisation posts, as not many are determined enough (or have enough knowledge of assembly) to care about it (be able to do it). Others do it, but share the knowledge by helping in particular cases. Anyway, all rules may be found by following what you've written:

Always view the ASM listing after adding/changing new code looking for large chunks of ASM code for small lines of Basic. Also always look at the changes in the number of instructions used in the ROM at the end of the compiler output when compiling.

and some of them are pretty obvious (though mabe not to everyone - the more sense makes your good work).

Some explanations and tips to your rules.

ad 5) - Logical statements undergo the same rule as other statements - dividing them into simpler ones ussually leads to shorter final code. Thus, using a boolean variable in conditional statements allows to estimate it's value in simple statements before the conditional test. In other words, instead of using if (...) or (...), it's better to use if bool, and estimate the bool variable earlier.

ad 10) - Locals are mostly replaced by internal registers (Rx) and operation on these is faster (less, or no bank switching in PIC16s and access bank use in PIC18s). In PIC18s the same effect may be obtained with globals declared in access bank (see below).

I have a specific question about optimisation hopefully someone can answer. Looking through the assembly listing I see a lot of extra code used to set the register banks back and forward, sometimes for no reason at all.

In PIC16s, one may use the directive absolute to make sure that mostly used (or used together) variables are placed in the same memory bank. In PIC18s access bank is best for global vars of frequent use.

P.S. Next release of mB will have SSA optimisation, which is supposed to better optimise expression evaluation. Possibly, there'll be no need to divide complex expressions into simpler ones as whole groups of expressions will be optimised anyway.

Dany · #4 Post by **Dany** » 27 Mar 2010 20:13

Hi Dean,

I think your analysis and suggestions are also important to a large extent for mikroPascal users. The compilers will not differ that much and some of the suggestions are not so language dependant.

My request now is: may I place a link to this forumthread on my PIC/mikroPascal website?
It would be placed in the section "Tips" (http://users.edpnet.be/rosseel01/DRO/PIC/index.htm#Tips), item "Code Size and Speed Tips".

Thanks in advance
and keep up the good work.

basicbasic111 · #5 Post by **basicbasic111** » 27 Mar 2010 21:19

deandob
Thank you so much!

Your post is like a good book for newbies like me.

I will surely read it word by word.

deandob · #6 Post by **deandob** » 27 Mar 2010 21:46

Thanks guys for the responses - good to see this thread kickstarted. I am using MikroBasic Pro as a hobbyist using the free (up to 2K ROM) version so this my way to pay back the community. I'm quite amazed at what functionality can be fitted into 2K even with a higher level (and not as efficient) compiler like Basic, in my little 2K 16F628 I have LCD routines (with scrolling), input & output support using 74HC chips to extend the IO, Infrared input, speech recognition, light sensor, temperature sensor and my own RS485 peer to peer protocol. I'm still debugging the final version of the code and about 50 instructions too many with the current code from having it all fit together. A key to keeping the code size down with so much functionality is that I use a lot of state machines (using globals) and timer variables updated from a single TIMER0 ISR.

I don't use any of the standard library routines for the above functionality but wrote my own by looking at the listing for the library ASM and writing the equivalent basic statements or ASM callouts. This is especially useful to get rid off all the back & forward bank switching which is usually unnecessary (but you have to be careful to keep the bank setting the same as the entry setting when you exit your ASM as the compiler does not expect it to change).

Something strange I have noticed as the ROM gets close to being filled, a small change like removing or adding even a simple line like clearing a variable can add 30 or 40 instructions when compiling, yet the listing shows only a couple of instructions added so the compiler is making modifications somewhere else in the listing that I can't track. I have even seen where adding an extra line, then compiling I end up with more free space not less.

When is the new version of the compiler due that has SSA optimizations? Its been a while since the last update.

Dany, sure, feel free to add a link, I put this material together to help others. Janni, thank you for the tips, I like the one about booleans and 'absolute', I'll edit the original post to add it.

janni · #7 Post by **janni** » 27 Mar 2010 23:55

deandob wrote:Something strange I have noticed as the ROM gets close to being filled, a small change like removing or adding even a simple line like clearing a variable can add 30 or 40 instructions when compiling, yet the listing shows only a couple of instructions added so the compiler is making modifications somewhere else in the listing that I can't track. I have even seen where adding an extra line, then compiling I end up with more free space not less.

Compiler does a lot of optimisation and small changes may sometimes have unproportional influence on the final code or RAM organisation.
BTW, when one's pretty sure of own optimisation, setting compiler optimisation level to 0 may lead to smaller final code. That's because higher settings switch on optimisation mechanism that's efficient for some type of groups of statements (mostly expressions), but less efficient for others (mostly the well written ones).

When is the new version of the compiler due that has SSA optimizations? Its been a while since the last update.

Nobody knows that, I'm affraid. It'll be first implemented in dsPIC compilers (which, BTW had already a release with a nice set of other new features that will also be later implemented in PIC compilers).

deandob · #8 Post by **deandob** » 28 Mar 2010 00:53

Thanks Janni. You have added another tip on compiler optimisation that I'll edit into the original post.

Dany · #9 Post by **Dany** » 28 Mar 2010 10:07

deandob wrote:Dany, sure, feel free to add a link, I put this material together to help others.

Thanks! see http://www.rosseeld.be/DRO/PIC/index.htm#CodeSize

deandob · #10 Post by **deandob** » 29 Mar 2010 14:12

I'm getting close to the end of my optimization checking, and am able to fit all my functionality in with over 80 ROM instructions spare

However when replacing the last of the library routines with my own code, I have found when I removed the library and added one line of code I lost over 80 ROM words!
rxbuffer[rxcount] = UART1_Read() <== extra 13 words for the library routine
rxbuffer[rxcount] = RCREG <== BASIC equivalent (+ a couple of lines setting register bits), should save 10 words but is 80 words more.

The routine with the above changes stays about the same size, yet a number of other routines grow in size if I use the second line above instead of the first, which includes an extra library routine. I checked the listings and the compiler is doing a lot more bank setting all over the code, which looks unnecessary. For example:
0x00E0 0x1283 BCF STATUS, 5
0x00E1 0x1303 BCF STATUS, 6
0x00E2 0x01F2 CLRF R2
L_Framework_ShiftByte12:
0x00E3 0x1283 BCF STATUS, 5 <== This line is not present if I use the UART1_Read routine. This extra bank setting is redundant!
;Framework.mbas,408 :: VOL_CLOCK_PIN = 0

Can anyone explain why the compiler is adding all these extra bank setting unnecessarily, after making such a small and unrelated change? The compiler optimization level makes no difference. Is this a bug in the compiler optimiser or am I missing something with the PIC memory architecture?

janni · #11 Post by **janni** » 29 Mar 2010 15:48

It's not a bug per se but the optimiser is apparently lacking if it leaves unnecessary bank switching

. As I don't use mE's compilers with PIC16s (prefer assembly there), I don't have a ready tip. If the general rule to place most frequently used variables in the same bank (preferably BANK0) doesn't help, try rearranging your statements.

Note, that the internal compiler variables (Rx) are placed in 'common' RAM space (70..7F) in PIC16F628 and require no bank switching. It's sometimes advantagous to use them as temporary variables instead of own ones. The PRO compiler allows that.

deandob · #12 Post by **deandob** » 29 Mar 2010 21:16

Thanks Janni. Another 2 tips to add to the list.

Regarding using R0 - Rx, is that always safe to use them as local / temporary variables?

janni · #13 Post by **janni** » 30 Mar 2010 00:18

Unless used extensively (compiler needs them, too

), there's no problem.

deandob · #14 Post by **deandob** » 03 Apr 2010 22:31

A couple more tips added.

It would be nice to see a few more members post their optimisation techniques / tips.

rmteo · #15 Post by **rmteo** » 03 Apr 2010 23:17

deandob wrote:A couple more tips added.

It would be nice to see a few more members post their optimisation techniques / tips.

As janni mentioned, most users probably do not have the know-how or the inclination to look into this. Pic's are cheap and moving to a larger and/or more capable device - if the situation calls for it - is probably the path most would take. If it means having to buy a full version of the compiler, I think that many will agree that $200 is well worth the asking price. After, the guys at mikroE do need to make a living just like the rest of us.

Tips for optimising code size

Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Re: Tips for optimising code size

Copyright© 2020 MikroElektronika d.o.o.