Chapter 5 : Programming language Assembler
Introduction
The moment has come that hardware-oriented to the core make compromise if they want to stay “in the game”. Namely, unlike other circuits which need to be connected to other components and power supply in order to be of any use, the microcontrollers require program too. Luckily, their evolution still did not progress so far, so all of them (for the time being) “understand” only one machine language. It is a good news. The bad one is that even primitive, only microcontrollers and some experts can understand this language of zeros and ones. In order to bridge this gap between machine and humans, the first high-level programming language-Assembler was created.
The main problem- to remember the codes which electronics recognizes as commandswas solved, but a new one- equally complicated to both us and “them”(microcontrollers) arose. The conflict was resolved at common pleasure by means of the program for PC called assembler (not original at all) and a simple device called programmer.
By means of this program, the computer receives commands in form of abbreviations in environment familiar to us and unerringly returns them afterwards into so called “executable file”. This is the crucial moment when the program is compiled into machine code and this file ( named HEX file too) represents a series of binary numbers not understandable to us but completely clear to electronic circuits.The program written in assembly language cannot be executed in practice unless this file is programmed to the microcontroller’s memory. This is the moment when the last link on a chain-programmer- comes on the scene. It is nothing special- a small device connected to a PC using a port and contains socket for placing chip in. Press the button or click on a mouse and that’s it!
5.1 Elements of Assembler
Even simple, assembler is basically like any other language, which means that it has its words, rules and syntax. Its basic elements are:
- Labels
- Orders
- Directives
- Comments
Syntax of Assembly language
When writing program in assembler it is necessary to observe specific rules in order to enable the process of compiling into executable “HEX-code” run with no errors. These obligatory rules in writing program are called syntax and there are only several of them:
- Every line in a program may consists of maximum 255 characters.
- Every line in a program that should be compiled, must start with a symbol, an assembly control, a label, mnemonics or directive.
- All that follows mark “;” in a program line denotes comment and will not be translated.
- All elements of one program line (labels, instructions etc.) must be separated at least by one whitespace. For the sake of better clearness, pushbutton TAB on a keyboard is commonly used instead of whitespace, so it is easy to recognize columns with labels, directives etc. in a program.
Numbers
If octal numeric system, already considered as obsolite one is neglected, it is allowed in assembler to use numbers in one of three numeric systems:
Decimal NumbersIf there are no particular indications, the assembler interpretes all numbers as decimal ones. All ten digits are in use (0,1,2,3,4,5,6,7,8,9). Since at most 2 bytes are used for their memorizing to the microcontroller, the greatest number that can be written in this system is 65535. If it has to be emphasised that some number is in decimal format, that number is followed by the letter “D”. For example 1234D.
Hexadecimal NumbersThis is a common way of writing numbers in programming. Instead of 10, there are 16 digits in use (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). In Assembler, the greatest number that can be written in this system is FFFF ( corresponds to decimal number 65535). In order to distinguish them from decimal numbers, there is the letter “h”(in upper-or lowercase) following hexadecimal numbers. For example 54h.
Binary NumbersBinary numbers are often used when the value of each individual bit in some register is important, since each digit of binary number represents one bit. There are only two digits in use (0 and 1). The greatest number in this numeric system recognizable by assembler as correct one is 1111111111111111. In order to distinguish them from other numbers, there is the letter “b” (in upper-or lowercase) following binary numbers. For example 01100101B.
OperatorsInstead of writing symbols which have specific value, some assembler-used commands allow the use of logical and mathematical expessions. For example:
IF (VERSION>1) LCALL Table_2 USING VERSION+1 ENDIF ...
As it can be seen, the assembler is able to compute some values on its own and place them in a programming code. In that case, it distinguishes between the following mathematical and logical operations:
| Name | Operation | Example | Result |
| + | Addition | 10+5 | 15 |
| - | Subtraction | 25-17 | 8 |
| * | Multiplication | 7*4 | 28 |
| / | Division (with no remainder) | 7/4 | 1 |
| MOD | Remainder of division | 7 MOD 4 | 3 |
| SHR | Shift register bits to the right | 1000B SHR 2 | 0010B |
| SHL | Shift register bits to the left | 1010B SHL 2 | 101000B |
| NOT | Negation (first complement of number) | NOT 1 | 1111111111111110B |
| AND | Logical AND | 1101B AND 0101B | 0101B |
| OR | Logical OR | 1101B OR 0101B | 1101B |
| XOR | Exclusive OR | 1101B XOR 0101B | 1000B |
| LOW | 8 low significant bits | LOW(0AADDH) | 0DDH |
| HIGH | 8 high significant bits | HIGH(0AADDH) | 0AAH |
| EQ, = | Equal | 7 EQ 4 or 7=4 | 0 (false) |
| NE,<> | Not equal | 7 NE 4 or 7<>4 | 0FFFFH (true) |
| GT, > | Greater than | 7 GT 4 or 7>4 | 0FFFFH (true) |
| GE, >= | Greater or equal | 7 GE 4 or 7>=4 | 0FFFFH (true) |
| LT, < | Less than | 7 LT 4 or 7<4 | 0 (false) |
| LE,<= | Less or equal | 7 LE 4 or 7<=4 | 0 (false) |
Symbols
In assembly language, every register, constant, address or subroutine can be assigned a specific symbol, which considerably facilitates writing program. Hence, for example, if input pin P0.3 is connected to a pushbutton for manually interrupting of some process (pushbutton STOP), writing program will be simpler if the bit P0.3 is assigned the same name: “pushbutton_STOP”. Of course, like in any other language, there are specific rules as well:
- While writing symbols, it is allowed to use all letters from alphabet (A-Z, a-z), dec imal numbers (0-9) and two special characters ("?" and "_"). Assembler does not dif ferentiate between upper case and lower case.
For example, following symbols will be treated as identical:
Serial_Port_Buffer SERIAL_PORT_BUFFER
- In order to be different from a constant (number), every symbol must start with a letter or one of two special characters (? or _).
- Symbol may consist of maximum 255 characters, but only first 32 of them are taken into account. In the following example, the first two symbols will be interpreted as duplicate (error), while the third and forth symbols will be accepted as different ones:
START_ADDRESS_OF_TABLE_AND_CONSTANTS_1 START_ADDRESS_OF_TABLE_AND_CONSTANTS_2 TABLE_OF_CONSTANTS_1_START_ADDRESS TABLE_OF_CONSTANTC_2_START_ADDRESS
- Some symbols cannot be used because they are already part of instructions or assembly directives. Consequently, for example, a register or subroutine cannot be assigned name “A” or “DPTR” because the registers having the same name exist already.
The list of symbols not allowed to be used:
| A | AB | ACALL | ADD |
| ADDC | AJMP | AND | ANL |
| AR0 | AR1 | AR2 | AR3 |
| AR4 | AR5 | AR6 | AR7 |
| BIT | BSEG | C | CALL |
| CJNE | CLR | CODE | CPL |
| CSEG | DA | DATA | DB |
| DBIT | DEC | DIV | DJNZ |
| DPTR | DS | DSEG | DW |
| END | EQ | EQU | GE |
| GT | HIGH | IDATA | INC |
| ISEG | JB | JBC | JC |
| JMP | JNB | JNC | JNZ |
| JZ | LCALL | LE | LJMP |
| LOW | LT | MOD | MOV |
| MOVC | MOVX | MUL | NE |
| NOP | NOT | OR | ORG |
| ORL | PC | POP | PUSH |
| R0 | R1 | R2 | R3 |
| R4 | R5 | R6 | R7 |
| RET | RETI | RL | RLC |
| RR | RRC | SET | SETB |
| SHL | SHR | SJMP | SUBB |
| SWAP | USING | XCH | XCHD |
| XDATA | XOR | XRL | XSEG |
Labels
Labels are a special type of symbols used to denote address in subroutine and are recognizable by being always written at the beginning of a program line. Without them it is not allowed to call subroutine or execute some of jump or branch instructions. They are easy for use:
- Symbol (label) with some easily recognizable name should be written at the beginning of a program line from where subroutine starts or where jump should be executed.
- In instruction which calls this subroutine or a jump, instead of address in form of 16-bit number, it is sufficient to enter the name of label.
During program compiling into machine code, the assembler will automatically replace such symbols with correct addresses.
Directives
Unlike instructions being translated into machine code and written to on-chip program
memory, directives are commands of assembler itself and have no effect on the operation
of the microcontroller. Some of them are obligatory part of every program while
some of them are used only to facilitate or enhance the operation.
Directives are written to the column reserved for instructions. There is the rule allowing
only one directive per program line.
By means of this directive, a numeric value is replaced by a symbol. For example:
MAXIMUM EQU 99
After this directive, every appearance of the label “MAXIMUM” in the program, the assembler will interprete as number 99 (MAXIMUM = 99). It is only once possible to define symbols in this way so the EQU directive is mostly used at the beginning of the program.
SET directiveSimilar to the EQU directive, by means of the SET directive, a numeric value is replaced by a symbol. Significant difference is that with this directive it can be done for unlimited number of times:
SPEED SET 45 SPEED SET 46 SPEED SET 57BIT directive
By means of this directive, bit address is replaced by a symbol (bit address must be in the range of 0-255). For example:
TRANSMIT BIT PSW.7 ;Transmit bit (the seventh bit in PSW register)
;is assigned the name "TRANSMIT"
OUTPUT BIT 6 ;Bit at address 06 is assigned the name "OUTPUT"
RELAY BIT 81 ;Bit at address 81 (Port 0)is assigned the name ;"RELAY"
CODE directive
By means of this directive, an address in program memory is designated as a symbol. Since the maximal capacity of program memory is 64K, the address must be in the range of 0-65535. For example:
RESET CODE 0 ;Memory location 00h called "RESET" TABLE CODE 1024 ;Memory location 1024h called "TABLE"DATA directive
By means of this directive, an address within internal RAM is designated as a symbol (address must be in the range of 0-255). In other words, any selected register may change its name or be assigned a new one. For example:
TEMP12 DATA 32 ;Register at address 32 is named ;as "TEMP12" STATUS_R DATA D0h ;PSW register is assigned the name ;"STATUS_R"IDATA directive
By means of this directive, indirectly addressed register (its addrress is located in the specified register) changes its name or is assigned a new one. For example:
TEMP22 IDATA 32 ;Register whose address is in register ;at address 32 is named as "TEMP22" TEMP33 IDATA T_ADR ;Register whose address is in ;register T_ADR is named as "TEMP33"XDATA directive
This directive is used to name registers within external (additional) RAM memory. Address of such defined ragister cannot be greater than 65535. For example:
TABLE_1 XDATA 2048 ;Register stored in external
;memory at address 2048 is named
;as "TABLE_1"
ORG directive
This directive is used to define location in program memory where the program following directive is to be placed. For example:
BEGINNING ORG 100
...
...
ORG 1000h
TABLE ...
...
Program begins at location 100. The table with data will start at location 1024 (1000h).
USING directiveThis directive is used to define which register bank (registers R0-R7) will be used in the following program.
USING 0 ;Bank 0 is used (registers R0-R7 at RAM-addresses 0-7) USING 1 ;Bank 1 is used (registers R0-R7 at RAM-addresses 8-15) USING 2 ,Bank 2 is used (registers R0-R7 at RAM-addresses 16-23) USING 3 ;Bank 3 is used (registers R0-R7 at RAM-addresses 24-31)END directive
This directive must be at the end of every program. Once it encounters this directive, the assembler will stop interpreting program into machine code. For example:
... END ;End of program
Directive selecting memory segments
There are 5 such directives used for selecting one of five available memory segments in the microcontroller :
CSEG ;Indicates that next segment refers to program memory.
BSEG ;Selects bit-adressable part of RAM.
DSEG ;Indicates that next segment refers to internal RAM (part accessed by direct addressing).
ISEG ;Indicates that next segment refers to the “upper” part of internal RAM (part accessed by ;indirect addressing using registers R0 and R1).
XSEG ;Selects external RAM memory
When assembler gets started, the segment CSEG is activated by default and remains active until a new directive is specified. Each of these memory segments has its internal address counter which is cleared every time the assembler is started. Its value can be changed by indicating value after the mark AT (it may be a number, an arithmetical operation or a symbol). For example:
DSEG ;Next segment refers to registers with direct access.
BSEG AT 32 ;Selects bit-addressable part of memory with address counter
;moved by 32 bit locations relative to the beginning of that
;memory segment.
Dollar symbol "$" denotes current value of address counter in the segment which is currently active. The following two examples illustrate how this can be used in practice:
Example 1:
JNB FLEG,$ ;Program will constantly execute this
;instruction (jump instruction),until
;flag is cleared.
Example 2:
MESSAGE DB ‘ALARM turn off engine’ LENGTH EQU $-MESSAGE-1
Using previous two program lines, exact number of characters in the message “ALARM turn off engine” which is defined at address labelled as “MESSAGE”, can be computed.
DS directiveThis directive reserves space in memory expressed in bytes. It is used if the segment ISEG, DSEG or XSEG is currently active. For example:
Example 1:
DSEG ;Selects part of RAM with direct addressing
DS 32 ;Actual value of address counter is incremented by 32
SP_BUFF DS 16 ;Reserves space for serial port buffer
;(16 bytes)
IO_BUFF DS 8 ;Reserves space for I/O buffer in size of 8
;bytes
Example 2:
ORG 100 ;Starts at address 100
DS 8 ;8 bytes are reserved
LAB ......... ;Program continues execution (address of this
;location is 108)
DBIT directive
This directive reserves space within bit-addressable part of RAM (size is expressed in bits). It can be used only if the BSEG segment is active. For example:
BSEG ;Bit-addressable part of RAM is selected
IO_MAP DBIT 32 ;First 32 bits occupy space provided
;for I/O buffer
DB directive
This directive is used for writing indicated value to program memory. If several values are indicated one after another then they are separated by commas. If ASCII array should be indicated it is enclosed with single quotation marks.This directive can be used only if the segment CSEG is active. For example:
CSEG DB 22,33,’Alarm’,44
When written before this directive, the label will point to the first value in the array ( in this example number 22).
DW directiveThis directive has the same purpose as DB directive, but it is followed by two-byte value (the high byte is written first, the low byte afterwards).
IF, ENDIF and ELSE directivesThese directives are used to create so called conditional blocks in a program. Each of these blocks starts with directive IF and ends with directive ENDIF or ELSE. State or symbol (in parentheses) following the directive IF represents a condition which determines the part of the program to be compiled into machine code:
- If the statement is correct or if symbol is equal to one, program will include all instructions up to directive ELSE or ENDIF.
- If the statement is not correct or if the symbol value is equal to zero, all upcoming instructions are neglected (are not interpreted) and program continues with com mands following directives ELSE or ENDIF.
Example 1:
IF (VERSION>3) LCALL Table_2 LCALL Addition ENDIF ...
If the program is of later date than version 3 (statement is correct), subroutines “Table 2” and “Addition” will be executed. If the statement in parentheses is not correct (VERSION<3), two instructions calling subroutines are neglected and are not compiled.
Example 2:
If the symbol value “Model” is equal to one, first two instructions after directive IF will be compiled into machine code and program will afterwards continue with instructions following directive ENDIF (all instructions between ELSE and ENDIF are neglected). Otherwise, if Model=0, instructions between IF and ELSE are neglected and assembler compiles only instructions following directive ELSE.
IF (Model) MOV R0,#BUFFER MOV A,@R0 ELSE MOV R0,#EXT_BUFFER MOVX A,@R0 ENDIF ...Control directives
These directives are recognizable by having the dollar symbol $ as the first letter. These commands are used to define which files are to be used by assembler during compiling. It is also used to determine where executable file is to be stored as well as the final appearance of the compiled program. Although, there are many directives belonging to this category, only few of them is really important:
$INCLUDE directiveThe name of this directive tells enough about its purpose. During compiling, it enables assembler to use data stored in another file. For example:
$INCLUDE(TABLE.ASM)$MOD8253 directive
$MOD8253 is the file where names and addresses of all SFRs of 8253 microcontrollers are stored. By using this file and directive having the same name, assembler can execute program compiling only on the base of the registers’ names. In case where those would not be used it is necessary in introductory part of the program to define an accurate name and address for every SFRs that will be used in the program.
