Upload
vannhi
View
226
Download
2
Embed Size (px)
Citation preview
System Software
Prepared by P.Vasantha kumari 14
UNIT II ASSEMBLERS
2.1 Basic assembler functions
Assembler
Assembler which converts assembly language programs into object files. Object files
contain a combination of machine instructions, data, and information needed to place
instructions properly in memory.
Figure 2.1 - Assembler
Assembler functions
Convert mnemonic operation codes to their machine language equivalents
Convert symbolic operands to their equivalent machine addresses.
Build the machine instructions in the proper format.
Convert the data constants to internal machine representations
Write the object program and the assembly listing
Error checking is provided
Changes can be quickly and easily incorporated with a reassembly
Features of assemblers
Mnemonic operation codes
Symbolic operations
Data declarations
Assembly Language Statements
There are three types of statements are used in the assembly language. They are:
1. Imperative Statements
2. Declaration Statements
3. Assembler Directives
System Software
Prepared by P.Vasantha kumari 15
1. Imperative statements
It indicates an action to be performed during the execution of the assembled
program. It focuses on how to solve a problem based on side effects on memory. Each
imperative statement translates into one machine instruction.
2. Declaration Statements
It focuses on what the problem is and leave the solution mechanism up to the
language implementation. It is quite abstract and harder to implement efficiently.
3. Assembler Directives
It instructs the assembler to perform certain actions during the assembly of a
program. They can be used to declare variables, create storage space and declare constants.
Some of the assembler directives are:
• START Specify name and staring address for the program
• END indicates the end of the source program and (optionally) specifies the first
executable instruction in the program
• BYTE Generate character or hexadecimal constant, occupying as many bytes as
needed to represent the constant
• WORD Generate one-word integer constant
Statement Format
All the statements in the assembly language program is in the form of
[LABEL] <OPCODE> <OPERAND> COMMENTS
Label: It is an identifier and an optional field. It remembers where the data or code is
located. The maximum length of label which differs depends upon the assembler. Most of
the assembler that supports 32 characters in length. It is suffixed by a colon (:) and begins
with [A…Z].
Example: START: LDA #24
OPCODE: It contains mnemonic. OPCODE stands for operation code or machine code
instruction. It also requires operands.
OPERAND: It specifies constants, labels or immediate data. Data contained in another
accumulator or register and address.
System Software
Prepared by P.Vasantha kumari 16
Advantages of assembler
It reduced errors
Faster translation times
Changes could be made easier and faster
Disadvantages of assembler
Many instructions are required to achieve small tasks.
Source Programs tend to be large and difficult to follow
Programmers requires knowledge of the processor architecture and instruction set
Programs are machine independent requires complete rewrites if the hardware is
changed
2.2 A simple SIC assembler
Assembler Function
A simple SIC (Simplified Instructional Computer) assembler which performs following
functions such as:
Convert mnemonic operation codes to their machine language equivalents
Convert symbolic operands to their equivalent machine addresses.
Build the machine instructions in the proper format.
Convert the data constants to internal machine representations
Write the object program and the assembly listing
Assembler directives
The SIC assembler language has the following assembler directives.
START Specify name and staring address for the program
END Indicate the end of the source program and (optionally) specify the first
executable instruction in the program
BYTE Generate character or hexadecimal constant, occupying as many bytes as
needed to represent the constant
WORD Generate one-word integer constant
System Software
Prepared by P.Vasantha kumari 17
RESB Reserve the indicated number of bytes for a data area
RESW Reserve the indicated number of words for a data area
Assemblers
Assembler which converts the assembly language into machine cod or object code.
There are two types of assemblers are there:
o Two pass assembler
Pass 1 assembler
Pass 2 assembler
o One pass assembler
In two pass assembler, the first pass scans the source program for label definitions
and assigns addresses whereas the second performs most of the actual translation.
The assembler must process the assembler directives statements. These statements
are not translated into machine instructions. They provide instructions to the
assembler itself.
The assembler directives SATRT specifies the starting address of the object program
and END marks the end of the programs.
Assembler must write object code onto some output device. This object program
will later be loaded into memory for execution.
An object program contains three types of records:
Header
Text
End
Header record contains the program name, starting address and length. Text record
contains the translated instructions and data of the program, together with an
indication of the addresses where these are to be loaded. End record marks the end
of the object program and specifies the address in the program where execution is
to begin.
System Software
Prepared by P.Vasantha kumari 18
Functions of Assemblers
Pass 1(define symbols)
Assign addresses to all statements in the program.
Save the values (addresses) assigned to all labels for use in Pass 2.
Perform some processing of assembler directives.
Include processing that affects address assignment such as determining the length
of data areas defined by BYTE, RESW, etc.
Pass 2 (assemble instructions and generate object program)
Assemble instructions which translate operation codes and looking up addresses
Generate data values defined by BYTE, WORD, etc.
Perform processing of assembler directives not done during Pass 1.
Write the object program and the assembly listing.
Format of Object Program
Header Record
Col. 1 H
Col. 2-7 Program Name
Col. 8-13 Starting address of object program (hexadecimal)
Col. 14-19 Length of object program in bytes (hexadecimal)
Text Record
Col. 1 T
Col. 2-7 Starting address for object code in this record (hexadecimal)
Col. 8-9 Length of object code in this record in bytes (hexadecimal)
Col. 10-69 Object
End Record
Col. 1 E
Col. 2-7 Address to first executable instruction in object program
(hexadecimal)
System Software
Prepared by P.Vasantha kumari 19
Example
(i) Assembly Language Program with object Code
LOOCTR LABEL OPCODE OPERAND OBJ.CODE
MAIN START 2000
2000 BEGIN LDA NUM1 00200C
2003 STA NUM2 0C200F
2006 LDCH CHAR1 502012
2009 STCH CHAR2 542015
200C NUM1 WORD 5 000005
200F NUM2 RESW 1
2012 CHAR1 BYTE C’A’ 000041
2013 CHAR2 RESB 1
2014 END BEGIN
(ii) Object Program
H^MAIN^00 2000^00 0014
T^00 0000^0F^00 200C^0C 200F^50 2012^54 2015^00 0005
T^00 2012^01^00 0041
E^00 2000
2.3 Assembler algorithm and Data structures
Internal data structures
Operation Code Table (OPTAB)
Symbol Table (SYMTAB)
Location Counter (LOCCTR)
OPTAB
OPTAB is used to look up mnemonic operation codes and translate them to their
machine language equivalents.
System Software
Prepared by P.Vasantha kumari 20
In most cases, OPTAB is a static table. OPTAB must contain the mnemonic operation
code and its machine language equivalent. In more complex assemblers, OPTAB also
contains information about instruction format and length.
OPTAB is usually organized as a hash table, with mnemonic operation code as the
key.
SYMTAB
SYMTAB is used to store values (addresses) assigned to labels.
SYMTAB includes the name and value (address) for each label in the source program
together with flags to indicate error conditions.
• e.g., a symbol defined in two different places
This table may also contain information, such as type or length, about the data area
or instruction labeled.
SYMTAB is usually organized as a hash table for efficiency of insertion and retrieval.
• the label is the key of SYMTAB.
• non-random key
LOCCTR
This is a variable that is used to help in the assignment of address.
LOCCTR is initialized to the beginning address specified in the START statement.
After each source statement is processed, the length of the assembled instruction or
data area to be generated is added to LOCCRT.
When a label is reached, the current value of LOCCTR gives the address to be
associated with that label.
2.3.1 Two Pass Assemblers
Two pass assembler that translates the assembly language program itno object code
or machine in two passes i.e., pass 1 and pass 2. The pass 1 algorithm which scans the
source program for label definitions and assigns addresses whereas pass 2 algorithm
performs the actual translation.
System Software
Prepared by P.Vasantha kumari 21
Figure 2.2 – Two Pass Assembler
Pass 1 of a two pass assembler
Step 1: Read the input line.
Step 2: Check to see if the opcode field in the input line is “START”.
(i) Find if there is any operand field after START; initialize the LOCCTR to the
operand value.
(ii) Otherwise, if there is no value in the operand field the LOCCTR is set to zero.
Step 3: Write the line to the intermediate file.
Step 4: Repeat the following for the other lines in the program until the opcode field
contains END directive.
1. If there is a symbol in the label field.
i. Check the symbol table to see if has already been stored over there. If so then
it is a duplicate symbol, the error message should be displayed.
ii. Other wise the symbol is entered into the SYMTAB, along with the memory
address in which it is stored.
2. If there is an opcode in the opcode field
i. Search the OPTAB to see if the opcode is present, if so increment the location
counter (LOCCTR) by three.
ii. a) If the opcode is WORD, increment the LOCCTR by three.
b) If the opcode is BYTE, increment the LOCTR by one.
c) If the opcode is RESW, increment the LOCCTR by integer equivalent of the
operand value *3.
d) If the opcode is RESB, increment the LOCCTR by the integer equivalent of
the operand value.
3. Write each and every line processed to the intermediate file along with their
System Software
Prepared by P.Vasantha kumari 22
location counters.
Step 5: Calculate the length of the program by subtracting the starting address of the
program from the final value of the LOCCTR
Algorithm
Begin
Read first input line
if OPCODE=’START’ then
Begin
Save #[OPERAND] a starting address
Initialize LOCCTR to starting address
Write line to intermediate file
Read next input line
End (if START)
Else
Initialize LOCCTR to 0
While OPCODE ≠’END’ do
Begin
if this is not a comment line then
Begin
if there is a symbol in the LABEL filed then
Begin
Search SYMTAB for LABEL
if found then
Set error flag (duplicate symbol)
else
Insert (LABEL, LOCCTR) into SYMTAB
End {if symbol}
Search OPTAB for OPCODE
if found then
System Software
Prepared by P.Vasantha kumari 23
Add 3 {instruction length} to LOCCTR
else if OPCODE=’WORD’ then
Add 3* # [OPERAND] to LOCCTR
else if OPCODE=’RESW’ then
Add 3* # [OPERAND] to LOOCTR
else if OPCODE=’RESB’ then
Add # [OPERAND] to LOOCTR
else if OPCODE=’BYTE’ then
Begin
Find length of constant in bytes
Add length to LOOCTR
End {if BYTE}
Else
Set error flag (invalid operation code)
End {if not c comment}
Write line to intermediate file
Read next input line
End {while not END}
Write last line to intermediate file
Save (LOCCTR-starting address) as program length
End {pass 1}
Example
Input - Assembly Language Program
MAIN START 2000
BEGIN LDA NUM1
STA NUM2
LDCH CHAR1
STCH CHAR2
NUM1 WORD 5
NUM2 RESW 1
System Software
Prepared by P.Vasantha kumari 24
CHAR1 BYTE C’A’
CHAR2 RESB 1
END BEGIN
Output – Assign addresses to instruction
MAIN START 2000
2000 BEGIN LDA NUM1
2003 ** STA NUM2
2006 ** LDCH CHAR1
2009 ** STCH CHAR2
200C NUM1 WORD 5
200F NUM2 RESW 1
2012 CHAR1 BYTE C’A’
2013 CHAR2 RESB 1
2014 END BEGIN
Output – Symbol table
BEGIN 2000
NUM1 200C
NUM2 200F
CHAR1 2012
CHAR2 2013
Pass 2 of a two pass assembler
Step 1: Read the first line from the intermediate file.
Step 2: Check to see if the opcode field in the input line is “START”, if so then write the line
onto the final output file.
Step 3: Repeat the following for the other lines in the intermediate file until the opcode field
contains END directive.
1. If there is a symbol in the operand field, then the object code is assembled by
combining the machine code equivalent of the instruction with the symbol address.
2. If there is no symbol in the operand field, then the operand address is assigned
as zero and it is assembled with the machine code equivalent of the instruction.
System Software
Prepared by P.Vasantha kumari 25
3. If the opcode field is BYTE or WORD or RESB, then convert the constants in the
operand filed to the object code.
4. Write the input line along with the object code onto the final output file.
Step 4: Close all the opened files and exit.
Algorithm
Begin
Read first input line {from intermediate file}
if OPCODE=’START’ then
Begin
Write listing line
Read next input line
End {if START}
Write Header Record to Object Program
Initialize first Text Record
While OPCODE ≠’END’ do
Begin
if this is not a comment line then
Begin
Search OPTAB for OPCODE
if found then
Begin
if there is a symbol in OPERAND filed then
Begin
Search SYMTAB for OPERAND
if found then
Store symbol value as operand address
else
Begin
Store 0 as operand address
Set error flag (undefined symbol)
System Software
Prepared by P.Vasantha kumari 26
End
End {if symbol}
else
Store 0 as operand address
Assemble the object code instruction
end {if opcode found}
else if OPCODE=’BYTE’ or ‘WORD’ then
convert constant to object code
if object code will not fit into the current text Record then
begin
write Text record to object Program
initialize new text Record
end
add object code to Text Record
end {if not comment}
write listing line
read next input line
end {while not END}
write last Text Record to object program
Write End Record to object program
Write last listing line
End {pas 2}
Example
Input – Assembly Language Program with address
MAIN START 2000
2000 BEGIN LDA NUM1
2003 ** STA NUM2
2006 ** LDCH CHAR1
2009 ** STCH CHAR2
200C NUM1 WORD 5
System Software
Prepared by P.Vasantha kumari 27
200F NUM2 RESW 1
2012 CHAR1 BYTE C’A’
2013 CHAR2 RESB 1
2014 END BEGIN
Input – Symbol table
BEGIN 2000
NUM1 200C
NUM2 200F
CHAR1 2012
CHAR2 2013
Output – Object Code
MAIN START 2000
2000 BEGIN LDA NUM1 00200C
2003 STA NUM2 0C200F
2006 LDCH CHAR1 502012
2009 STCH CHAR2 542015
200C NUM1 WORD 5 000005
200F NUM2 RESW 1
2012 CHAR1 BYTE C’A’ 000041
2013 CHAR2 RESB 1
2014 END BEGIN
2.4 Machine Dependent Assembler Features
Register-to-register instructions are shorter and do not require another memory
reference. Use register-to-register instructions instead of register-to-memory
instructions whenever possible.
Most of register-to-memory instructions are assembled using either program-
counter relative addressing or base relative addressing
System Software
Prepared by P.Vasantha kumari 28
If the required displacement is too large, then the 4-byte extended instruction
format must be used. The programmer must specify the 4-byte format by adding
the prefix + to the operation code in the source statement.
If the required displacement is out of range, the assembler then attempts to use base
relative addressing. If neither form of relative addressing is applicable, then the
instruction cannot be properly assembled and the assembler must generate an error
message.
The assembler directive BASE is used in conjunction with base relative addressing.
Indirect addressing is indicated by adding the prefix @ to the operand. Immediate
addressing is specified with the prefix # to the immediate operands.
There are two important assembler features are:
Instruction Format and Addressing Modes
Program Relocation
(i) Instruction Format and Addressing Modes
Instruction Format: The programmer must specify the 4-byte format by adding the prefix
+ to the operation code in the source statement. If the required displacement is too large,
then the 4-byte extended instruction format must be used. If extended format is not
specified, the assembler may first attempt to translate the instruction using program-
counter relative addressing.
Addressing Modes: Most of register-to-memory instructions are assembled using either
program-counter relative addressing or base relative addressing. Immediate addressing is
specified with the prefix # to the immediate operands. If neither program counter relative
nor base relative addressing can be used, then 4 byte extended instruction format must be
used which consists of 20 bit address filed.
(a) Program counter relative addressing
Consider the assembly language,
LINE LOCCTR LABEL OPCODE OPERANDS
10 0000 FIRST STL RETADR
12 0003 LDB #LENGTH
System Software
Prepared by P.Vasantha kumari 29
13 BASE LENGTH
15 0006 CLOOP +JSUB RDREC
40 0017 J CLOOP
95 0030 RETADR RESW 1
100 0033 LENGTH RESW 1
125 1036 RDREC CLEAR X
133 103C +LDT #4096
160 104E STCH BUFFER, X
Example1: Consider the statement in Line 10. During the execution of instructions on SIC,
the program counter is advanced after the instruction is fetched and before it is executed.
RETADR is assigned the address 0030. Now, calculate the displacement value.
For program counter addressing,
TA= (PC) + Disp
Target address (TA)=RETADR=030 and (PC)=003
(PC) = address of the next instruction of line number 10.
Now displacement = TA-(PC) = 030-003
TA= 030= 0000 0011 0000
(PC)= 003= 0000 0000 0011 (subtract)
Disp= 0000 0010 1101 = 02D
The instruction format for this instruction using program counter addressing is shown
below.
6 1 1 1 1 1 1 12
0001 01 1 1 0 0 1 0 0000 0010 1101
Opcode n i x b p e Disp
1 7 2 0 2 D
Example2: Consider the statement in line 40, Jump to the label CLOOP which is already
defined in the address 0006. For program counter addressing,
TA= (PC) + Disp
Target address (TA)=CLOOP=006 and (PC)=01A
(PC) = address of the next instruction of line number 40.
System Software
Prepared by P.Vasantha kumari 30
Now displacement = TA-(PC) = 006-01A
TA= 006= 0000 0000 0110
(PC)= 01A= 0000 0001 1010 (subtract)
Disp= 1111 1110 1100 = FEC
The instruction format for this instruction using program counter addressing is shown
below.
6 1 1 1 1 1 1 12
0011 11 1 1 0 0 1 0 1111 1110 1100
Opcode n i x b p e Disp
1 7 F E C
(b) Base Relative addressing
Consider the statement in line 160 it stores the value of base register to BUFFER. The base
register which takes the value of LENGTH stored in the address 0033. For this instruction,
the disp will be calculated using base relative addressing:
TA= (B) + Disp
Target address (TA)=BUFFER=036 and (B)=033
Now displacement = TA-(B)
= 036-033=003
(c) Immediate Addressing
Consider the statement in line 12, which stores the value of LENGTH to accumulator. In
immediate addressing, this immediate value is assigned to the displacement field. If the
value is fit into 12 bits then use format 3 type instruction. Otherwise format 4 instruction
type is used.
(ii) Program Relocation
An object program that contains the information necessary to perform the modification is
called a re-locatable program. The assembler can identify for the loader those parts of the
object program that need modification. The memory address of operands should be
modified according to the loaded address, while constant data should remain unchanged. In
order to avoid the re- locatable problem, we use modification record. Program relocation is
needed because of the following reasons:
System Software
Prepared by P.Vasantha kumari 31
It is desirable to load and run several problems at the same time.
The system must be able to load programs into memory wherever there is a room.
The exact starting address of the program is not known until load time.
The modification record
The assembler produces a modification record describing the address and length of an
address field to be modified. The loader will add the beginning address of the loaded
program to the address field specified by a modification record.
Col. 1 M
Col. 2-7 Starting location of the address field to be modified,
relative to the beginning of the program
Col. 8-9 Length of the address field to be modified, in half-
bytes
Example
Figure 2.3 – Program Relocation
System Software
Prepared by P.Vasantha kumari 32
2.5 Machine Independent Assembler Features
• Literals
• Symbol-defining statements
• Expressions
• Program blocks
(i) Literals
It is convenient for the programmer to be able to write the value of a constant operand as a
part of the instruction that uses it. Such an operand is called a literal
45 001A ENDFIL LDA =C‘EOF’ 003210
In this assembler language notation, a literal is identified with the prefix=, which is
followed by a specification of the literal value.
The difference between a literal and an immediate operand
With immediate addressing, the operand value is assembled as a part of the machine
instruction.
55 0020 LDA #3 010003 With a literal, the assembler generates the specified value as a constant at some
other memory location. The address of this generated constant is used as the target
address for the machine instruction.
45 001A ENDFIL LDA=C’EOF’ 032010 Literal pool
• All of the literal operands used in the program are gathered together into one or
more literal pools.
• Normally literals are placed into a pool at the end of the program.
• Sometimes, it is desirable to place literals into a pool at some other location in the
object program.
– LTORG directive is introduced for this purpose.
– When the assembler encounters a LTORG, it creates a pool that contains all of
the literals used since the previous LTORG.
Literal for current value of location counter
The value of the location counter can be denoted by a literal operand *.
System Software
Prepared by P.Vasantha kumari 33
BASE *
LDB =*
The literal =* repeatedly used in the program that have identical names but different
values, and both must be in the literal pool.
Handling duplicate literal
The assembler should avoid storing duplicate literals. The easiest way to recognize
duplicate literals is by comparison of the character strings defining them.
For example,
215 1062 WLOOP TD =X’05’
230 106B WD =X’05’
In this case, literal ‘05’ is repeatedly used. In order to avoid the duplication enter the details
of literal into the literal able LITTAB only once.
Literal table (LITTAB)
The basic data structure needed to process literal operands is a literal table (LITTAB).
LITTAB is often organized as a hash table, using the literal name or value as the key.
which consists of literal name, hexadecimal value, address and value fields.
Example
Literals Hexadecimal
Value
Length Address
C’EOF’ 454F46 3 002D
X’05’ 05 1 1076
Implementation of Literals
Pass 1
For each recognized literal operand, search LITTAB. If the literal is already present
in the table, no action is need; if it is not present, the literal is added to LITTAB
without assigning its address.
When a LTORG statement is encountered or the end of the program, the assembler
makes a scan of LITTAB and assigns an address to each literal.
Update the location counter to reflect the number of bytes occupied by each literal.
System Software
Prepared by P.Vasantha kumari 34
Pass 2
Search LITTAB for each literal operand encountered.
The data values specified by the literals in each literal pool are inserted at the
appropriate places in the object program.
In the same way as these values generated by BYTE or WORD statements.
If a literal value represents an address in the program, the assembler must generate
the appropriate Modification record.
(ii) Symbol-Defining Statements
EQU directives
Assemblers provide an assembler directive EQU that allows the programmer to define the
symbol and specify their values. The general syntax to use the EQU is:
symbol EQU value
When the assembler encounters the EQU statement, it enters “symbol” into SYMTAB with
the value of “symbol”.
Use of EQU
(i) Establish symbolic names that can be used for improved readability in place of numeric.
values.
+LDT #4096
MAXLEN EQU 4096
+LDT #MAXLEN
(ii) Define mnemonic names for registers.
A EQU 0
X EQU 1
L EQU 2
(iii) Establish and use names that reflect the logical function of the registers in the program.
BASE EQU R1
COUNT EQU R2
INDEX EQU R3
System Software
Prepared by P.Vasantha kumari 35
ORG directives
The assembler directive ORG is usually used to indirectly assign values to symbols. The
general syntax to use the ORG directive is:
ORG value
“value” is a constant or an expression involving constants and previously defined symbols.
When this statement is encountered, the assembler resets its location counter (LOCCTR) to
the specified value. Since the values of symbols are taken from LOOCTR, the ORG statement
will affect the values of all labels defined until the next ORG.
Use ORG for label definition
Suppose that we want to define a table with the following structure.
SYMBOL field - 6 bytes
VALUE field - 3 bytes or 1 word
FLAG field - 2 bytes
SYMBOL field contains user defined symbols. VALUE field defines the value assigned to the
symbol and FALG field specifies the symbol type and other information.
SYMBOL VALUE FLAGS
To reserve the space for symbol table, we can write
STAB RESB 1100
Totally, 1100 bytes are reserved for 100 entries.
The EQU directive is used to define the labels for the symbol table such as SYMBOL, VALUE
and FLAG using the following statements:
STAB RESB 1100
SYMBOL EQU STAB
VALUE EQU STAB+6
FLAGS EQU STAB+9
STAB (100 Entries)
System Software
Prepared by P.Vasantha kumari 36
We can fetch the VALUE field from the table entry indicated by the contents of register X
using
LDA VALUE, X
The same symbol definition using ORG is as follows:
STAB RESB 1100
ORG STAB
SYMBOL RESB 6
VALUE RESW 1
FLAGS RESB 2
ORG STAB+1100
The first ORG resets the LOOCTR to the value of STAB and the last ORG set the LLOCCR
back to its previous value.
Restrictions of EQU and ORG in an ordinary two-pass assembler
For an ordinary two-pass assembler, all symbols must be defined during Pass 1. Hence, the
following sequences could not be processed by an ordinary two-pass assembler. All terms
used to specify the value of the new symbol must have been defined previously in the
program.
Example1:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1 (not valid)
Example2:
ORG ALPHA
BYTE1 RESB 1
BYTE2 RESB 1
BYTE3 RESB 1
ORG
ALPHA RESB 1 (not valid)
System Software
Prepared by P.Vasantha kumari 37
Example3:
BETA EQU ALPHA
ALPHA RESW 1 (not valid)
Example4:
ALPHA RESW 1
BETA EQU ALPHA (valid)
(iii) Expressions
Most assemblers allow the use of expressions whenever a single operand such as a label or
literal is permitted. Each such expression must be evaluated by the assembler to produce a
single operand address or value. Assemblers generally allow arithmetic expressions
formed according to the normal rule using the operators +,-,*, and /. Individual terms in
the expression may be constants, user-defined symbols, or special terms. The most
common special term is the current value of the location counter (often designated by *)
Types of terms
Absolute terms -> The value of an absolute term is independent of program location.
Relative terms -> The value of a relative term is dependent on the beginning address of the
program.
Types of expressions
By the type of value produced, expressions can classified as
Absolute expressions: The value of an absolute expression is independent of the
program location. The absolute expression may contains relative terms provided the
relative terms occur in pairs and the terms in each such pair have opposite signs. No
relative term can enter multiplication or division operation.
•e.g. MAXLEN EQU BUFEND-BUFFER
Relative expressions: The value of a relative expression is relative the beginning
address of the object program. Expressions that are neither relative nor absolute
should be flagged by the assembler as errors. Relative expressions can be written as
S+r
where S is the starting address of the program
r is the relative term related to the beginning of the program.
System Software
Prepared by P.Vasantha kumari 38
(iv) Program Blocks
It refers to segments of code that are rearranged within a single object program unit. The
assembler directive USE is used to define the block for the program statements. The
general syntax for USE directive is:
USE [block name]
Three blocks are used in the assembly language program. They are:
o Unnamed Program block (default block)
o CDATA block
o CBLKS block
At the beginning, statements are assumed to be part of the unnamed (default) block. If no
USE statements are included, the entire program belongs to this single block. Each program
block may actually contain several separate segments of the source program.
Implementation of Program Blocks
Pass 1
Each program block has a separate location counter
Each label is assigned an address that is relative to the start of the block that
contains it
At the end of Pass 1, the latest value of the location counter for each block indicates
the length of that block
The assembler can then assign to each block a starting address in the object
program
Pass 2
The address of each symbol can be computed by adding the assigned block starting
address and the relative address of the symbol to that block
2.6 One Pass Assembler
The one-pass assembler is used if it is necessary and desirable to avoid a second pass over
program. A one-pass assembler scans the program just once. The main problem in trying
to assemble a program in one pass involves forward references. All storage reservation
statements can be defined before they are referenced. But, forward references to labels on
System Software
Prepared by P.Vasantha kumari 39
instructions cannot be eliminated as easily. The logic of the program often needs a forward
jump. The one-pass assembler must make some special provision for handling forward
references.
Two types of one-pass assemblers
1. One type of one-pass assemblers produces object code directly in memory for immediate
execution.
• No object program is written out.
• No loader is needed.
2. The other type of one-pass assemblers produces the usual kind of object program for
later execution.
Load-and-go assembler
The assembler that does not write object program out and does not need a loader is called
a Load -and-go assembler.
• It avoids the overhead of writing the object program out and reading it back in.
• It is useful in a system that is oriented toward program development and testing.
• A load-and-go assembler can be a one-pass assembler or a two-pass assembler.
Handling of forward references in one-pass load-and-go assembler
The assembler generates object code instructions as it scans the source program.
If an instruction operand is a symbol that has not yet been defined,
the symbol is entered into the symbol table with a flag indicating that the symbol is
undefined;
the operand address is omitted when the instruction is assembled;
the operand address is added to a list of forward references associated with the
symbol table entry.
When the definition for a symbol is encountered, the forward reference list for that symbol
is scanned, and the proper address is inserted into any instructions previously generated.
Algorithm
Begin
Read first input line
if OPCODE=’START’ then
System Software
Prepared by P.Vasantha kumari 40
Begin
Save #[OPERAND] a starting address
Initialize LOCCTR to starting address
Write line to intermediate file
Read next input line
End (if START)
Else
Initialize LOCCTR to 0
While OPCODE ≠’END’ do
Begin
if this is not a comment line then
Begin
if there is a symbol in the LABEL filed then
Begin
Search SYMTAB for LABEL
if found then
Begin
if <symbol value> as NULL
set <symbol value> as LOCCTR and search the linked list with
corresponding operand
PTR addresses and generate operand addresses as corresponding
symbol values
set symbol values as LOOCTR in symbol table and delete linked list
End
else
Insert (LABEL, LOCCTR) into SYMTAB
End {if symbol}
Search OPTAB for OPCODE
if found then
Begin
System Software
Prepared by P.Vasantha kumari 41
Search SYMTAB for OPERAND addresses
If found then
If symbol value not equal to NULL then
Store symbol value as OPERAND address
Else
Insert at the end of the linked list with a node with
address as LOCCTR
Else
Insert (symbol name, NULL)
LOCCTR+=3
End
else if OPCODE=’WORD’ then
Add 3 to LOCCTR and convert comment to object code
else if OPCODE=’RESW’ then
Add 3* # [OPERAND] to LOOCTR
else if OPCODE=’RESB’ then
Add # [OPERAND] to LOOCTR
else if OPCODE=’BYTE’ then
Begin
Find length of constant in bytes
Add length to LOOCTR
Convert constant to object code
End
If object code will not fit into current Text Record then
Begin
Write Text Record to object program initialize new Text Record
End
Add object code to Text Record
End
Write listing line
System Software
Prepared by P.Vasantha kumari 42
Read next input line
End
Write last Text Record to object program
Write End Record to object program
Write last listing line
End {one pass}
Explanation
Step 1: Read the input line.
Step 2: Check to see if the opcode field in the input line is “START”.
1. Find if there is any operand field after START; initialize the LOCCTR to the
operand value.
2. Otherwise if there is no value in the operand field the LOCCTR is set to zero.
Step 3: Write the line onto the output file.
Step 4: Repeat the following for the other lines in the input file until the opcode field
contains END directive.
1. If there is a symbol in the label field.
i. Check the symbol table to see if has already been stored and if it is marked as
undefined entry. If so then update the symbol table with the proper address and
mark it as defined entry.
ii. Other wise the symbol is entered into the symbol table along with the memory
address in which it is stored.
2. If there is an opcode in the opcode field
i. Search the OPTAB to see if the opcode is present, if so increment the location
counter (LOCCTR) by three.
ii. a) If the opcode is WORD, increment the LOCCTR by three and convert the
constants in the operand field to the object code.
b) If the opcode is BYTE, increment the LOCTR by one and convert the
constants in the operand field to the object code.
c) If the opcode is RESW, increment the LOCCTR by integer equivalent of the
operand value *3 and convert the constants in the operand field to the object
System Software
Prepared by P.Vasantha kumari 43
code.
d) If the opcode is RESB, increment the LOCCTR by the integer equivalent of the
operand value and convert the constants in the operand field to the object code.
3. If there is a symbol in the operand field.
i. Check the symbol table to see if has already been stored. If so, then assemble
the object code by combining the machine code equivalent of the instruction with
the symbol address.
ii. Other wise the symbol is entered into the symbol table and it is marked as
undefined entry.
4. If there is no symbol in the operand field, then operand address is assigned as zero,
and it is assembled with the machine code equivalent of the instruction.
5. Write the input line along with the object code onto output file.
Step 5: Close all the opened files and exit.
Example
Source Program with object code
System Software
Prepared by P.Vasantha kumari 44
Memory Address Contents
1000 454F4600 00030000 00xxxxxx xxxxxxxx
1010 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
.
.
2000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx14
2010 100948-- --00100C 28100630 ----48--
2020 --3C2012
.
Symbol Table
2.7 Multi Pass Assembler
Multi pass assembler which translate the assembly language program into the
machine code or object code in multiple passes. It is used to eliminate forward references
in symbol definition. It creates a number of passes that is necessary to process the
definition of symbols. It is unnecessary for a multi-pass assembler to make more than two
System Software
Prepared by P.Vasantha kumari 45
passes over the entire program. Instead, only the parts of the program involving forward
references need to be processed in multiple passes. The method presented here can be
used to process any kind of forward references. If we use a two-pass assembler, the
following symbol definition cannot be allowed.
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
This is because ALPHA and BETA cannot be defined in pass 1. Actually, if we allow multi-
pass processing, DELTA is defined in pass 1, BETA is defined in pass 2, and ALPHA is
defined in pass 3, and the above definitions can be allowed. This is the motivation for using
a multi-pass assembler.
Multi pass assembler uses a symbol table to store symbols that are not totally
defined yet. For a undefined symbol, in its entry, We store the names and the number of
undefined symbols which contribute to the calculation of its value. We also keep a list of
symbols whose values depend on the defined value of this symbol. When a symbol becomes
defined, we use its value to reevaluate the values of all of the symbols that are kept in this
list. The above step is performed recursively.
Example
The following symbols defining statements which involves forward references.
HALFSZ EQU MAXLEN/2
MAXLEN EQU BUFEND-BUFFER
PREVBT EQU BUFFER-1
BUFFER RESB 4096
BUFEND EQU *
These statements would not assign the address for labels within two passes. The following
figure displays the symbol table entries resulting from pass 1 processing the first
statement.
HALFSZ EQU MAXLEN/2
System Software
Prepared by P.Vasantha kumari 46
MAXLEN has not yet been defined so the value for HALFSZ is not computed. Expression for
HALFSZ is stored in the symbol table in place of its value. The entry and 1 indicates that
one symbol in the defining expression is undefined.
For the next statement,
MAXLEN EQU BUFEND-BUFFER
There are two undefined symbols involved in the definitions. They are BUFFEND and
BUFFER. Both of these are entered into SYMTB with lists indicates the dependence of
MAXLEN upon them. The definition of the second statement is shown in the following
figure.
The next figure shows the defining symbol details of the third statement
PREVBT EQU BUFFER-1
System Software
Prepared by P.Vasantha kumari 47
In this case, a new undefined symbol PREVBT is added to the symbol table and it is also
defined from the symbol BUFFER. So, it can be added to the list.
The next figure shows that when BUUFEND is defined, MAXLEN and HALFSZ
The following figure shows the symbol definitions for the statement
BUFFER RESB 4096
In this case, BUFFER symbol is defined. From this symbol definition, PREVBT can be
defined accordingly.
The next figure shows the complete symbol table entries after defining the statement
BUFEND EQU *
In this case, BUFFEND symbol is defined. The current value of LOCCTR will be assigned to
BUFFEND. From this symbol definition, MAXLEN and HALFSZ can be determined
System Software
Prepared by P.Vasantha kumari 48
accordingly. This completes the symbol definition process. If any symbols remains
undefined at the end of the program, the assembler would flag them as errors.
2.8 Implementation Examples – MASM Assembler
An MASM assembler language program is written as a collection of segments. Each
segment is defined as belonging to a particular class, corresponding to its contents.
Commonly used classes are CODE, DATA, CONST and STACK.
Segments are addressed via x86 segment registers during the program execution.
Code segments are addressed using code segment register CS, and stack segments
are addressed using stack segment register SS. Thee segment registers are
automatically set by the system loader when a program is loaded for execution.
Register CS is set to indicate the segment that contains the starting address specified
in the END statement of the program. Data segments including constants segments
are addressed using DS, ES, FS, or GS. The segment register can be specified
explicitly by the programmer. If the programmer does not specify a segment
register, one is selected by the assembler.
By default, the assembler assumes the default register is DS. The register can be
changed using the assembler directive ASSUME. For example,
ASSUME ES: DATASEG1
System Software
Prepared by P.Vasantha kumari 49
tells the assembler to assume the register ES indicates the segment DATASEG1.
Jump instructions are assembled in two different ways, depending on whether the
target of the jump is in the same code segment or in different code segment. A near
jump is a jump to a target in the same code segment; a far jump is a jump to a target
in a different code segment.
A near jump is assembled using current code segment register CS and far jump must
be assembled using different segment register, which is specified in an instruction
prefix. Near jump instruction occupies 2 or 3 bytes of memory (depending upon
whether the jump address s within 128 bytes of the current instruction) where as
far jump occupies 5 bytes of memory.
By default, MASM assumes that a forward jump is near jump. If the target of the
jump is in another code segment, the programmer must warn the assembler by
writing
JMP FAR PTR TARGET
If the jump address is within 128 bytes of the current instruction, the programmer
can specify the shorter (2-bytes) near jump by writing
JMP SHORT TAGET
If the JMP to TARGET is a far jump, and the programmer does not specify PTR, a
problem occurs. During pass 1, the assembler reserves 3 bytes for the jump
instruction. The actual assembled instruction requires 5 bytes. In the earlier version,
it causes an error. In later versions, the MASM assembler can repeat pass 1 to
general location counter values.
The length of the instruction depends on the operands that are used. Immediate
operands may occupy from 1to 4 bytes in the instruction. An operand that specifies
a memory location may take various amount of space in the instruction.
Segments in an MASM can be written in more than one part. If a segment directive
specifies the same name a s a previously defined segment, it is considered to be a
continuation of that segment.
References between the segments are handled by assembler and the external
references are handled by the linker.