5.1 Elements of Assembly Language
Assembly language is basically like any other language, which means that it has its words, rules and syntax. The basic elements of assembly language are:
- Directives; and
Syntax of Assembly language
When writing a program in assembly language it is necessary to observe specific rules in order to enable the process of compiling into executable “HEX-code” to run without errors. These compulsory rules are called syntax and there are only several of them:
- Every program line may consist of a maximum of 255 characters;
- Every program line to be compiled, must start with a symbol, label, mnemonics or directive;
- Text following the mark “;” in a program line represents a comment ignored (not compiled) by the assembler; and
- All the elements of one program line (labels, instructions etc.) must be separated by at least one space character. For the sake of better clearness, a push button TAB on a keyboard is commonly used instead of it, so that it is easy to delimit columns with labels, directives etc. in a program.
If octal number system, otherwise considered as obsolite, is disregarded, assembly laguage allows numbers to be used in one out of three number systems:
If not stated otherwise, the assembly language considers all the numbers as decimal. All ten digits are used (0,1,2,3,4,5,6,7,8,9). Since at most 2 bytes are used for saving them in the microcontroller, the largest decimal number that can be written in assembly language is 65535. If it is necessary to specify that some of the numbers is in decimal format, then it has to be followed by the letter “D”. For example 1234D.
Hexadecimal numbers are commonly used in programming. There are 16 digits in hexadecimal number system (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). The largest hexadecimal number that can be written in assembly language is FFFF. It corresponds to decimal number 65535. In order to distinguish hexadecimal numbers from decimal, they are followed by the letter “h”(either in upper- or lowercase). For example 54h.
Binary numbers are often used when the value of each individual bit of some of the registers is important, since each binary digit represents one bit. There are only two digits in use (0 and 1). The largest binary number written in assembly language is 1111111111111111. In order to distinguish binary numbers from other numbers, they are followed by the letter “b” (either in upper- or lowercase). For example 01100101B.
Some of the assembly-used commands use logical and mathematical expessions instead of symbols having specific values. For example:
As seen, the assembly language is capable of computing some values and including them in a program code, thus using the following mathematical and logical operations:
||Division (with no remainder)
||Remainder of division
||7 MOD 4
||Shift register bits to the right
||1000B SHR 2
||Shift register bits to the left
||1010B SHL 2
||Negation (first complement of number)
||1101B AND 0101B
||1101B OR 0101B
||1101B XOR 0101B
||8 low significant bits
||8 high significant bits
||7 EQ 4 or 7=4
||7 NE 4 or 7<>4
||7 GT 4 or 7>4
||Greater or equal
||7 GE 4 or 7>=4
||7 LT 4 or 7<4
||Less or equal
||7 LE 4 or 7<=4
Every register, constant, address or subroutine can be assigned a specific symbol in assembly language, which considerably facilitates the process of writing a program. For example, if the P0.3 input pin is connected to a push button used to stop some process manually (push button STOP), the process of writing a program will be much simpler if the P0.3 bit is assigned the same name as the push button, i.e. “pushbutton_STOP”. Of course, like in any other language, there are specific rules to be observed as well:
- For the purpose of writing symbols in assembly language, all letters from alphabet (A-Z, a-z), decimal numbers (0-9) and two special characters ("?" and "_") can be used. Assembly language is not case sensitive.
For example, the following symbols will be considered identical:
- In order to distinguish symbols from constants (numbers), every symbol starts with a letter or one of two special characters (? or _).
- The symbol may consist of maximum of 255 characters, but only first 32 are taken into account. In the following example, the first two symbols will be considered duplicate (error), while the third and forth symbols will be considered different:
- Some of the symbols cannot be used when writing a program in assembly language because they are already part of instructions or assembly directives. Thus, for example, a register or subroutine cannot be assigned name “A” or “DPTR” because there are registers having the same name.
Here is a list of symbols not allowed to be used during programming in assembly language:
A label is a special type of symbols used to represent a textual version of an address in ROM or RAM memory. They are always placed at the beginning of a program line. It is very complicated to call a subroutine or execute some of the jump or branch instructions without them. They are easily used:
- A symbol (label) with some easily recognizable name should be written at the beginning of a program line from which a subroutine starts or where jump should be executed.
- It is sufficient to enter the name of label instead of address in the form of 16-bit number in instructions calling a subroutine or jump.
During the process of compiling, the assembler automatically replaces such symbols with appropriate addresses.
Unlike instructions being compiled and written to chip program memory, directives are commands of assembly language itself and have no influence on the operation of the microcontroller. Some of them are obligatory part of every program while some are used only to facilitate or speed up the operation. Directives are written in the column reserved for instructions. There is a rule allowing only one directive per program line.
The EQU directive is used to replace a number by a symbol. For example:
MAXIMUM EQU 99
After using this directive, every appearance of the label “MAXIMUM” in the program will be interpreted by the assembler as the number 99 (MAXIMUM = 99). Symbols may be defined this way only once in the program. The EQU directive is mostly used at the beginning of the program therefore.
The SET directive is also used to replace a number by a symbol. The significant difference compared to the EQU directive is that the SET directive can be used an unlimited number of times:
SPEED SET 45
SPEED SET 46
SPEED SET 57
The BIT directive is used to replace a bit address by a symbol. The bit address must be in the range of 0 to 255. For example:
TRANSMIT BIT PSW.7 ;Transmit bit (the seventh bit in PSW register)
;is assigned the name "TRANSMIT"
OUTPUT BIT 6 ;Bit at address 06 is assigned the name "OUTPUT"
RELAY BIT 81 ;Bit at address 81 (Port 0)is assigned the name ;"RELAY"
The CODE directive is used to assign a symbol to a program memory address. Since the maximum capacity of program memory is 64K, the address must be in the range of 0 to 65535. For example:
RESET CODE 0 ;Memory location 00h called "RESET"
TABLE CODE 1024 ;Memory location 1024h called "TABLE"
The DATA directive is used to assign a symbol to an address within internal RAM. The address must be in the range of 0 to 255. It is possible to change or assign a new name to any register. For example:
TEMP12 DATA 32 ;Register at address 32 is named ;as "TEMP12"
STATUS_R DATA D0h ;PSW register is assigned the name ;"STATUS_R"
The IDATA directive is used to change or assign a new name to an indirectly addressed register. For example:
TEMP22 IDATA 32 ;Register whose address is in register ;at address 32 is named as "TEMP22"
TEMP33 IDATA T_ADR ;Register whose address is in ;register T_ADR is named as "TEMP33"
The XDATA directive is used to assign a name to registers within external (additional) RAM memory. The addresses of these registers cannot be larger than 65535. For example:
TABLE_1 XDATA 2048 ;Register stored in external
;memory at address 2048 is named
The ORG directive is used to specify a location in program memory where the program following directive is to be placed. For example:
BEGINNING ORG 100
This program starts at location 100. The table containing data is to be stored at location 1024 (1000h).
The USING directive is used to define which register bank (registers R0-R7) is to be used in the program.
USING 0 ;Bank 0 is used (registers R0-R7 at RAM-addresses 0-7)
USING 1 ;Bank 1 is used (registers R0-R7 at RAM-addresses 8-15)
USING 2 ,Bank 2 is used (registers R0-R7 at RAM-addresses 16-23)
USING 3 ;Bank 3 is used (registers R0-R7 at RAM-addresses 24-31)
The END directive is used at the end of every program. The assembler will stop compiling once the program encounters this directive. For example:
END ;End of program
Directives used for selecting memory segments
There are 5 directives used for selecting one out of five memory segments in the microcontroller:
CSEG ;Indicates that the next segment refers to program memory;
BSEG ;Selects bit-addressable part of RAM;
DSEG ;Indicates that the next segment refers to the part of internal RAM accessed by
ISEG ;Indicates that the next segment refers to the part of internal RAM accessed by
;indirect addressing using registers R0 and R1); and
XSEG ;Selects external RAM memory.
The CSEG segment is activated by default after enabling the assembler and remains active until a new directive is specified. Each of these memory segments has its internal address counter which is cleared every time the assembler is activated. Its value can be changed by specifying value after the mark AT. It can be a number, an arithmetical operation or a symbol. For example:
DSEG ;Next segment refers to directly accessed registers; and
BSEG AT 32 ;Selects bit-addressable part of memory with address counter
;moved by 32 bit locations relative to the beginning of that
A dollar symbol "$" denotes current value of address counter in the currently active segment. The following two examples illustrate how this value can be used practically: Example 1:
JNB FLEG,$ ;Program will constantly execute this
;instruction (jump instruction),until
;the flag is cleared.
MESSAGE DB ‘ALARM turn off engine’
LENGTH EQU $-MESSAGE-1
These two program lines can be used for computing exact number of characters in the message “ALARM turn off engine” which is defined at the address assigned the name “MESSAGE”.
The DS directive is used to reserve memory space expressed in bytes. It is used if some of the following segments ISEG, DSEG or XSEG is currently active. For example: Example 1:
DSEG ;Select directly addressed part of RAM
DS 32 ;Current value of address counter is incremented by 32
SP_BUFF DS 16 ;Reserve space for serial port buffer
IO_BUFF DS 8 ;Reserve space for I/O buffer in size of 8 bytes
ORG 100 ;Start at address 100
DS 8 ;8 bytes are reserved
LAB ......... ;Program proceeds with execution (address of this location is 108)
The DBIT directive is used to reserve space within bit-addressable part of RAM. The memory size is expressed in bits. It can be used only if the BSEG segment is active. For example:
BSEG ;Bit-addressable part of RAM is selected
IO_MAP DBIT 32 ;First 32 bits occupy space intended for I/O buffer
The DB directive is used for writing specified value into program memory. If several values are specified, then they are separated by a comma. If ASCII array is specified, it should be enclosed within single quotation marks. This directive can be used only if the CSEG segment is active. For example:
If this directive is preceeded by a lable, then the label will point to the first element of the array. It is the number 22 in this example.
The DW directive is similar to the DB directive. It is used for writing a two-byte value into program memory. The higher byte is written first, then the lower one.
IF, ENDIF and ELSE directives
These directives are used to create so called conditional blocks in the program. Each of these blocks starts with directive IF and ends with directive ENDIF or ELSE. The statement or symbol (in parentheses) following the IF directive represents a condition which specifies the part of the program to be compiled:
- If the statement is correct or if the symbol is equal to one, the program will include all instructions up to directive ELSE or ENDIF.
- If the statement is not correct or if the symbol value is equal to zero, all instructions are ignored, i.e. not compiled, and the program continues with instructions following directives ELSE or ENDIF.
If the program is of later date than version 3 (statement is correct), subroutines “Table 2” and “Addition” will be executed. If the statement in parentheses is not correct (VERSION<3), two instructions calling subroutines will not be compiled. Example 2: If the value of the symbol called “Model” is equal to one, the first two instructions following directive IF will be compiled and the program continues with instructions following directive ENDIF (all instructions between ELSE and ENDIF are ignored). Otherwise, if Model=0, instructions between IF and ELSE are ignored and the assembler compiles only instructions following directive ELSE.
Control directives start with a dollar symbol $. They are used to determine which files are to be used by the assembler during compilation, where the executable file is to be stored as well as the final layout of the compiled program called Listing. There are many control directives, but only few of them is of importance:
This directive enables the assembler to use data stored in other files during compilation. For example:
This $MOD8253 directive is a file containing names and addresses of all SFRs of 8253 microcontrollers. By means of this file and directive having the same name, the assembler can compile the program on the basis of register names. If they are not used, it is necessary to specify name and address of every SFRs to be used at the beginning of the program.