5.1 Elements of Assembly Language
Assembly language is basically like any other language, which means that it has its words, rules and syntax. The basic elements of assembly language are:
- Labels;
- Orders;
- Directives; and
- Comments.
Syntax of Assembly language
When writing a program in assembly language it is necessary to observe specific rules in order to enable the process of compiling into executable “HEX-code” to run without errors. These compulsory rules are called syntax and there are only several of them:
- Every program line may consist of a maximum of 255 characters;
- Every program line to be compiled, must start with a symbol, label, mnemonics or directive;
- Text following the mark “;” in a program line represents a comment ignored (not compiled) by the assembler; and
- All the elements of one program line (labels, instructions etc.) must be separated by at least one space character. For the sake of better clearness, a push button TAB on a keyboard is commonly used instead of it, so that it is easy to delimit columns with labels, directives etc. in a program.
Numbers
If octal number system, otherwise considered as obsolite, is disregarded, assembly laguage allows numbers to be used in one out of three number systems:
Decimal Numbers
If not stated otherwise, the assembly language considers all the numbers as decimal. All ten digits are used (0,1,2,3,4,5,6,7,8,9). Since at most 2 bytes are used for saving them in the microcontroller, the largest decimal number that can be written in assembly language is 65535. If it is necessary to specify that some of the numbers is in decimal format, then it has to be followed by the letter “D”. For example 1234D.
Hexadecimal Numbers
Hexadecimal numbers are commonly used in programming. There are 16 digits in hexadecimal number system (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). The largest hexadecimal number that can be written in assembly language is FFFF. It corresponds to decimal number 65535. In order to distinguish hexadecimal numbers from decimal, they are followed by the letter “h”(either in upper- or lowercase). For example 54h.
Binary Numbers
Binary numbers are often used when the value of each individual bit of some of the registers is important, since each binary digit represents one bit. There are only two digits in use (0 and 1). The largest binary number written in assembly language is 1111111111111111. In order to distinguish binary numbers from other numbers, they are followed by the letter “b” (either in upper- or lowercase). For example 01100101B.
Operators
Some of the assembly-used commands use logical and mathematical expessions instead of symbols having specific values. For example:
IF (VERSION>1)
LCALL Table_2
USING VERSION+1
ENDIF
...
As seen, the assembly language is capable of computing some values and including them in a program code, thus using the following mathematical and logical operations:
NAME |
OPERATION |
EXAMPLE |
RESULT |
---|
+ |
Addition |
10+5 |
15 |
- |
Subtraction |
25-17 |
8 |
* |
Multiplication |
7*4 |
28 |
/ |
Division (with no remainder) |
7/4 |
1 |
MOD |
Remainder of division |
7 MOD 4 |
3 |
SHR |
Shift register bits to the right |
1000B SHR 2 |
0010B |
SHL |
Shift register bits to the left |
1010B SHL 2 |
101000B |
NOT |
Negation (first complement of number) |
NOT 1 |
1111111111111110B |
AND |
Logical AND |
1101B AND 0101B |
0101B |
OR |
Logical OR |
1101B OR 0101B |
1101B |
XOR |
Exclusive OR |
1101B XOR 0101B |
1000B |
LOW |
8 low significant bits |
LOW(0AADDH) |
0DDH |
HIGH |
8 high significant bits |
HIGH(0AADDH) |
0AAH |
EQ, = |
Equal |
7 EQ 4 or 7=4 |
0 (false) |
NE,<> |
Not equal |
7 NE 4 or 7<>4 |
0FFFFH (true) |
GT, > |
Greater than |
7 GT 4 or 7>4 |
0FFFFH (true) |
GE, >= |
Greater or equal |
7 GE 4 or 7>=4 |
0FFFFH (true) |
LT, < |
Less than |
7 LT 4 or 7<4 |
0 (false) |
LE,<= |
Less or equal |
7 LE 4 or 7<=4 |
0 (false) |
Symbols
Every register, constant, address or subroutine can be assigned a specific symbol in assembly language, which considerably facilitates the process of writing a program. For example, if the P0.3 input pin is connected to a push button used to stop some process manually (push button STOP), the process of writing a program will be much simpler if the P0.3 bit is assigned the same name as the push button, i.e. “pushbutton_STOP”. Of course, like in any other language, there are specific rules to be observed as well:
- For the purpose of writing symbols in assembly language, all letters from alphabet (A-Z, a-z), decimal numbers (0-9) and two special characters ("?" and "_") can be used. Assembly language is not case sensitive.
For example, the following symbols will be considered identical:
Serial_Port_Buffer
SERIAL_PORT_BUFFER
- In order to distinguish symbols from constants (numbers), every symbol starts with a letter or one of two special characters (? or _).
- The symbol may consist of maximum of 255 characters, but only first 32 are taken into account. In the following example, the first two symbols will be considered duplicate (error), while the third and forth symbols will be considered different:
START_ADDRESS_OF_TABLE_AND_CONSTANTS_1
START_ADDRESS_OF_TABLE_AND_CONSTANTS_2
TABLE_OF_CONSTANTS_1_START_ADDRESS
TABLE_OF_CONSTANTC_2_START_ADDRESS
- Some of the symbols cannot be used when writing a program in assembly language because they are already part of instructions or assembly directives. Thus, for example, a register or subroutine cannot be assigned name “A” or “DPTR” because there are registers having the same name.
Here is a list of symbols not allowed to be used during programming in assembly language:
A |
AB |
ACALL |
ADD |
ADDC |
AJMP |
AND |
ANL |
AR0 |
AR1 |
AR2 |
AR3 |
AR4 |
AR5 |
AR6 |
AR7 |
BIT |
BSEG |
C |
CALL |
CJNE |
CLR |
CODE |
CPL |
CSEG |
DA |
DATA |
DB |
DBIT |
DEC |
DIV |
DJNZ |
DPTR |
DS |
DSEG |
DW |
END |
EQ |
EQU |
GE |
GT |
HIGH |
IDATA |
INC |
ISEG |
JB |
JBC |
JC |
JMP |
JNB |
JNC |
JNZ |
JZ |
LCALL |
LE |
LJMP |
LOW |
LT |
MOD |
MOV |
MOVC |
MOVX |
MUL |
NE |
NOP |
NOT |
OR |
ORG |
ORL |
PC |
POP |
PUSH |
R0 |
R1 |
R2 |
R3 |
R4 |
R5 |
R6 |
R7 |
RET |
RETI |
RL |
RLC |
RR |
RRC |
SET |
SETB |
SHL |
SHR |
SJMP |
SUBB |
SWAP |
USING |
XCH |
XCHD |
XDATA |
XOR |
XRL |
XSEG |
Labels
A label is a special type of symbols used to represent a textual version of an address in ROM or RAM memory. They are always placed at the beginning of a program line. It is very complicated to call a subroutine or execute some of the jump or branch instructions without them. They are easily used:
- A symbol (label) with some easily recognizable name should be written at the beginning of a program line from which a subroutine starts or where jump should be executed.
- It is sufficient to enter the name of label instead of address in the form of 16-bit number in instructions calling a subroutine or jump.
During the process of compiling, the assembler automatically replaces such symbols with appropriate addresses.
Directives
Unlike instructions being compiled and written to chip program memory, directives are commands of assembly language itself and have no influence on the operation of the microcontroller. Some of them are obligatory part of every program while some are used only to facilitate or speed up the operation. Directives are written in the column reserved for instructions. There is a rule allowing only one directive per program line.
EQU directive
The EQU directive is used to replace a number by a symbol. For example:
MAXIMUM EQU 99
After using this directive, every appearance of the label “MAXIMUM” in the program will be interpreted by the assembler as the number 99 (MAXIMUM = 99). Symbols may be defined this way only once in the program. The EQU directive is mostly used at the beginning of the program therefore.
SET directive
The SET directive is also used to replace a number by a symbol. The significant difference compared to the EQU directive is that the SET directive can be used an unlimited number of times:
SPEED SET 45
SPEED SET 46
SPEED SET 57
BIT directive
The BIT directive is used to replace a bit address by a symbol. The bit address must be in the range of 0 to 255. For example:
TRANSMIT BIT PSW.7 ;Transmit bit (the seventh bit in PSW register)
;is assigned the name "TRANSMIT"
OUTPUT BIT 6 ;Bit at address 06 is assigned the name "OUTPUT"
RELAY BIT 81 ;Bit at address 81 (Port 0)is assigned the name ;"RELAY"
CODE directive
The CODE directive is used to assign a symbol to a program memory address. Since the maximum capacity of program memory is 64K, the address must be in the range of 0 to 65535. For example:
RESET CODE 0 ;Memory location 00h called "RESET"
TABLE CODE 1024 ;Memory location 1024h called "TABLE"
DATA directive
The DATA directive is used to assign a symbol to an address within internal RAM. The address must be in the range of 0 to 255. It is possible to change or assign a new name to any register. For example:
TEMP12 DATA 32 ;Register at address 32 is named ;as "TEMP12"
STATUS_R DATA D0h ;PSW register is assigned the name ;"STATUS_R"
IDATA directive
The IDATA directive is used to change or assign a new name to an indirectly addressed register. For example:
TEMP22 IDATA 32 ;Register whose address is in register ;at address 32 is named as "TEMP22"
TEMP33 IDATA T_ADR ;Register whose address is in ;register T_ADR is named as "TEMP33"
XDATA directive
The XDATA directive is used to assign a name to registers within external (additional) RAM memory. The addresses of these registers cannot be larger than 65535. For example:
TABLE_1 XDATA 2048 ;Register stored in external
;memory at address 2048 is named
;as "TABLE_1"
ORG directive
The ORG directive is used to specify a location in program memory where the program following directive is to be placed. For example:
BEGINNING ORG 100
...
...
ORG 1000h
TABLE ...
...
This program starts at location 100. The table containing data is to be stored at location 1024 (1000h).
USING directive
The USING directive is used to define which register bank (registers R0-R7) is to be used in the program.
USING 0 ;Bank 0 is used (registers R0-R7 at RAM-addresses 0-7)
USING 1 ;Bank 1 is used (registers R0-R7 at RAM-addresses 8-15)
USING 2 ,Bank 2 is used (registers R0-R7 at RAM-addresses 16-23)
USING 3 ;Bank 3 is used (registers R0-R7 at RAM-addresses 24-31)
END directive
The END directive is used at the end of every program. The assembler will stop compiling once the program encounters this directive. For example:
...
END ;End of program
Directives used for selecting memory segments
There are 5 directives used for selecting one out of five memory segments in the microcontroller:
CSEG ;Indicates that the next segment refers to program memory;
BSEG ;Selects bit-addressable part of RAM;
DSEG ;Indicates that the next segment refers to the part of internal RAM accessed by
;direct addressing;
ISEG ;Indicates that the next segment refers to the part of internal RAM accessed by
;indirect addressing using registers R0 and R1); and
XSEG ;Selects external RAM memory.
The CSEG segment is activated by default after enabling the assembler and remains active until a new directive is specified. Each of these memory segments has its internal address counter which is cleared every time the assembler is activated. Its value can be changed by specifying value after the mark AT. It can be a number, an arithmetical operation or a symbol. For example:
DSEG ;Next segment refers to directly accessed registers; and
BSEG AT 32 ;Selects bit-addressable part of memory with address counter
;moved by 32 bit locations relative to the beginning of that
;memory segment.
A dollar symbol "$" denotes current value of address counter in the currently active segment. The following two examples illustrate how this value can be used practically: Example 1:
JNB FLEG,$ ;Program will constantly execute this
;instruction (jump instruction),until
;the flag is cleared.
Example 2:
MESSAGE DB ‘ALARM turn off engine’
LENGTH EQU $-MESSAGE-1
These two program lines can be used for computing exact number of characters in the message “ALARM turn off engine” which is defined at the address assigned the name “MESSAGE”.
DS directive
The DS directive is used to reserve memory space expressed in bytes. It is used if some of the following segments ISEG, DSEG or XSEG is currently active. For example: Example 1:
DSEG ;Select directly addressed part of RAM
DS 32 ;Current value of address counter is incremented by 32
SP_BUFF DS 16 ;Reserve space for serial port buffer
;(16 bytes)
IO_BUFF DS 8 ;Reserve space for I/O buffer in size of 8 bytes
Example 2:
ORG 100 ;Start at address 100
DS 8 ;8 bytes are reserved
LAB ......... ;Program proceeds with execution (address of this location is 108)
DBIT directive
The DBIT directive is used to reserve space within bit-addressable part of RAM. The memory size is expressed in bits. It can be used only if the BSEG segment is active. For example:
BSEG ;Bit-addressable part of RAM is selected
IO_MAP DBIT 32 ;First 32 bits occupy space intended for I/O buffer
DB directive
The DB directive is used for writing specified value into program memory. If several values are specified, then they are separated by a comma. If ASCII array is specified, it should be enclosed within single quotation marks. This directive can be used only if the CSEG segment is active. For example:
CSEG
DB 22,33,’Alarm’,44
If this directive is preceeded by a lable, then the label will point to the first element of the array. It is the number 22 in this example.
DW directive
The DW directive is similar to the DB directive. It is used for writing a two-byte value into program memory. The higher byte is written first, then the lower one.
IF, ENDIF and ELSE directives
These directives are used to create so called conditional blocks in the program. Each of these blocks starts with directive IF and ends with directive ENDIF or ELSE. The statement or symbol (in parentheses) following the IF directive represents a condition which specifies the part of the program to be compiled:
- If the statement is correct or if the symbol is equal to one, the program will include all instructions up to directive ELSE or ENDIF.
- If the statement is not correct or if the symbol value is equal to zero, all instructions are ignored, i.e. not compiled, and the program continues with instructions following directives ELSE or ENDIF.
Example 1:
IF (VERSION>3)
LCALL Table_2
LCALL Addition
ENDIF
...
If the program is of later date than version 3 (statement is correct), subroutines “Table 2” and “Addition” will be executed. If the statement in parentheses is not correct (VERSION<3), two instructions calling subroutines will not be compiled. Example 2: If the value of the symbol called “Model” is equal to one, the first two instructions following directive IF will be compiled and the program continues with instructions following directive ENDIF (all instructions between ELSE and ENDIF are ignored). Otherwise, if Model=0, instructions between IF and ELSE are ignored and the assembler compiles only instructions following directive ELSE.
IF (Model)
MOV R0,#BUFFER
MOV A,@R0
ELSE
MOV R0,#EXT_BUFFER
MOVX A,@R0
ENDIF
...
Control directives
Control directives start with a dollar symbol $. They are used to determine which files are to be used by the assembler during compilation, where the executable file is to be stored as well as the final layout of the compiled program called Listing. There are many control directives, but only few of them is of importance:
$INCLUDE directive
This directive enables the assembler to use data stored in other files during compilation. For example:
$INCLUDE(TABLE.ASM)
$MOD8253 directive
This $MOD8253 directive is a file containing names and addresses of all SFRs of 8253 microcontrollers. By means of this file and directive having the same name, the assembler can compile the program on the basis of register names. If they are not used, it is necessary to specify name and address of every SFRs to be used at the beginning of the program.