UNIT II ASSEMBLERS 2.1 Basic assembler functionschettinadtech.ac.in/storage/14-07-01/14-07-01-14-24-40-2638... · [LABEL] COMMENTS ... 2.2 A simple

System Software

Prepared by P.Vasantha kumari 14

UNIT II ASSEMBLERS

2.1 Basic assembler functions

Assembler

Assembler which converts assembly language programs into object files. Object files

contain a combination of machine instructions, data, and information needed to place

instructions properly in memory.

Figure 2.1 - Assembler

Assembler functions

Convert mnemonic operation codes to their machine language equivalents

Convert symbolic operands to their equivalent machine addresses.

Build the machine instructions in the proper format.

Convert the data constants to internal machine representations

Write the object program and the assembly listing

Error checking is provided

Changes can be quickly and easily incorporated with a reassembly

Features of assemblers

Mnemonic operation codes

Symbolic operations

Data declarations

Assembly Language Statements

There are three types of statements are used in the assembly language. They are:

1. Imperative Statements

2. Declaration Statements

3. Assembler Directives

System Software


1. Imperative statements

It indicates an action to be performed during the execution of the assembled

program. It focuses on how to solve a problem based on side effects on memory. Each

imperative statement translates into one machine instruction.

2. Declaration Statements

It focuses on what the problem is and leave the solution mechanism up to the

language implementation. It is quite abstract and harder to implement efficiently.

3. Assembler Directives

It instructs the assembler to perform certain actions during the assembly of a

program. They can be used to declare variables, create storage space and declare constants.

Some of the assembler directives are:

• START Specify name and staring address for the program

• END indicates the end of the source program and (optionally) specifies the first

executable instruction in the program

• BYTE Generate character or hexadecimal constant, occupying as many bytes as

needed to represent the constant

• WORD Generate one-word integer constant

Statement Format

All the statements in the assembly language program is in the form of

[LABEL] <OPCODE> <OPERAND> COMMENTS

Label: It is an identifier and an optional field. It remembers where the data or code is

located. The maximum length of label which differs depends upon the assembler. Most of

the assembler that supports 32 characters in length. It is suffixed by a colon (:) and begins

with [A…Z].

Example: START: LDA #24

OPCODE: It contains mnemonic. OPCODE stands for operation code or machine code

instruction. It also requires operands.

OPERAND: It specifies constants, labels or immediate data. Data contained in another

accumulator or register and address.

System Software


Advantages of assembler

It reduced errors

Faster translation times

Changes could be made easier and faster

Disadvantages of assembler

Many instructions are required to achieve small tasks.

Source Programs tend to be large and difficult to follow

Programmers requires knowledge of the processor architecture and instruction set

Programs are machine independent requires complete rewrites if the hardware is

changed

2.2 A simple SIC assembler

Assembler Function

A simple SIC (Simplified Instructional Computer) assembler which performs following

functions such as:

Convert mnemonic operation codes to their machine language equivalents

Convert symbolic operands to their equivalent machine addresses.

Build the machine instructions in the proper format.

Convert the data constants to internal machine representations

Write the object program and the assembly listing

Assembler directives

The SIC assembler language has the following assembler directives.

START Specify name and staring address for the program

END Indicate the end of the source program and (optionally) specify the first

executable instruction in the program

BYTE Generate character or hexadecimal constant, occupying as many bytes as

needed to represent the constant

WORD Generate one-word integer constant

System Software


RESB Reserve the indicated number of bytes for a data area

RESW Reserve the indicated number of words for a data area

Assemblers

Assembler which converts the assembly language into machine cod or object code.

There are two types of assemblers are there:

o Two pass assembler

Pass 1 assembler

Pass 2 assembler

o One pass assembler

In two pass assembler, the first pass scans the source program for label definitions

and assigns addresses whereas the second performs most of the actual translation.

The assembler must process the assembler directives statements. These statements

are not translated into machine instructions. They provide instructions to the

assembler itself.

The assembler directives SATRT specifies the starting address of the object program

and END marks the end of the programs.

Assembler must write object code onto some output device. This object program

will later be loaded into memory for execution.

An object program contains three types of records:

Header

Text

End

Header record contains the program name, starting address and length. Text record

contains the translated instructions and data of the program, together with an

indication of the addresses where these are to be loaded. End record marks the end

of the object program and specifies the address in the program where execution is

to begin.

System Software


Functions of Assemblers

Pass 1(define symbols)

Assign addresses to all statements in the program.

Save the values (addresses) assigned to all labels for use in Pass 2.

Perform some processing of assembler directives.

Include processing that affects address assignment such as determining the length

of data areas defined by BYTE, RESW, etc.

Pass 2 (assemble instructions and generate object program)

Assemble instructions which translate operation codes and looking up addresses

Generate data values defined by BYTE, WORD, etc.

Perform processing of assembler directives not done during Pass 1.

Write the object program and the assembly listing.

Format of Object Program

Header Record

Col. 1 H

Col. 2-7 Program Name

Col. 8-13 Starting address of object program (hexadecimal)

Col. 14-19 Length of object program in bytes (hexadecimal)

Text Record

Col. 1 T

Col. 2-7 Starting address for object code in this record (hexadecimal)

Col. 8-9 Length of object code in this record in bytes (hexadecimal)

Col. 10-69 Object

End Record

Col. 1 E

Col. 2-7 Address to first executable instruction in object program

(hexadecimal)

System Software


Example

(i) Assembly Language Program with object Code

LOOCTR LABEL OPCODE OPERAND OBJ.CODE

MAIN START 2000

2000 BEGIN LDA NUM1 00200C

2003 STA NUM2 0C200F

2006 LDCH CHAR1 502012

2009 STCH CHAR2 542015

200C NUM1 WORD 5 000005

200F NUM2 RESW 1

2012 CHAR1 BYTE C’A’ 000041

2013 CHAR2 RESB 1

2014 END BEGIN

(ii) Object Program

H^MAIN^00 2000^00 0014

T^00 0000^0F^00 200C^0C 200F^50 2012^54 2015^00 0005

T^00 2012^01^00 0041

E^00 2000

2.3 Assembler algorithm and Data structures

Internal data structures

Operation Code Table (OPTAB)

Symbol Table (SYMTAB)

Location Counter (LOCCTR)

OPTAB

OPTAB is used to look up mnemonic operation codes and translate them to their

machine language equivalents.

System Software


In most cases, OPTAB is a static table. OPTAB must contain the mnemonic operation

code and its machine language equivalent. In more complex assemblers, OPTAB also

contains information about instruction format and length.

OPTAB is usually organized as a hash table, with mnemonic operation code as the

key.

SYMTAB

SYMTAB is used to store values (addresses) assigned to labels.

SYMTAB includes the name and value (address) for each label in the source program

together with flags to indicate error conditions.

• e.g., a symbol defined in two different places

This table may also contain information, such as type or length, about the data area

or instruction labeled.

SYMTAB is usually organized as a hash table for efficiency of insertion and retrieval.

• the label is the key of SYMTAB.

• non-random key

LOCCTR

This is a variable that is used to help in the assignment of address.

LOCCTR is initialized to the beginning address specified in the START statement.

After each source statement is processed, the length of the assembled instruction or

data area to be generated is added to LOCCRT.

When a label is reached, the current value of LOCCTR gives the address to be

associated with that label.

2.3.1 Two Pass Assemblers

Two pass assembler that translates the assembly language program itno object code

or machine in two passes i.e., pass 1 and pass 2. The pass 1 algorithm which scans the

source program for label definitions and assigns addresses whereas pass 2 algorithm

performs the actual translation.

System Software


Figure 2.2 – Two Pass Assembler

Pass 1 of a two pass assembler

Step 1: Read the input line.

Step 2: Check to see if the opcode field in the input line is “START”.

(i) Find if there is any operand field after START; initialize the LOCCTR to the

operand value.

(ii) Otherwise, if there is no value in the operand field the LOCCTR is set to zero.

Step 3: Write the line to the intermediate file.

Step 4: Repeat the following for the other lines in the program until the opcode field

contains END directive.

1. If there is a symbol in the label field.

i. Check the symbol table to see if has already been stored over there. If so then

it is a duplicate symbol, the error message should be displayed.

ii. Other wise the symbol is entered into the SYMTAB, along with the memory

address in which it is stored.

2. If there is an opcode in the opcode field

i. Search the OPTAB to see if the opcode is present, if so increment the location

counter (LOCCTR) by three.

ii. a) If the opcode is WORD, increment the LOCCTR by three.

b) If the opcode is BYTE, increment the LOCTR by one.

c) If the opcode is RESW, increment the LOCCTR by integer equivalent of the

operand value *3.

d) If the opcode is RESB, increment the LOCCTR by the integer equivalent of

the operand value.

3. Write each and every line processed to the intermediate file along with their

System Software


location counters.

Step 5: Calculate the length of the program by subtracting the starting address of the

program from the final value of the LOCCTR

Algorithm

Begin

Read first input line

if OPCODE=’START’ then

Begin

Save #[OPERAND] a starting address

Initialize LOCCTR to starting address

Write line to intermediate file

Read next input line

End (if START)

Else

Initialize LOCCTR to 0

While OPCODE ≠’END’ do

Begin

if this is not a comment line then

Begin

if there is a symbol in the LABEL filed then

Begin

Search SYMTAB for LABEL

if found then

Set error flag (duplicate symbol)

else

Insert (LABEL, LOCCTR) into SYMTAB

End {if symbol}

Search OPTAB for OPCODE

if found then

System Software


Add 3 {instruction length} to LOCCTR

else if OPCODE=’WORD’ then

Add 3* # [OPERAND] to LOCCTR

else if OPCODE=’RESW’ then

Add 3* # [OPERAND] to LOOCTR

else if OPCODE=’RESB’ then

Add # [OPERAND] to LOOCTR

else if OPCODE=’BYTE’ then

Begin

Find length of constant in bytes

Add length to LOOCTR

End {if BYTE}

Else

Set error flag (invalid operation code)

End {if not c comment}



End {while not END}

Write last line to intermediate file

Save (LOCCTR-starting address) as program length

End {pass 1}

Example

Input - Assembly Language Program

MAIN START 2000

BEGIN LDA NUM1

STA NUM2

LDCH CHAR1

STCH CHAR2

NUM1 WORD 5

NUM2 RESW 1

System Software


CHAR1 BYTE C’A’

CHAR2 RESB 1

END BEGIN

Output – Assign addresses to instruction

MAIN START 2000

2000 BEGIN LDA NUM1

2003 ** STA NUM2

2006 ** LDCH CHAR1

2009 ** STCH CHAR2

200C NUM1 WORD 5

200F NUM2 RESW 1

2012 CHAR1 BYTE C’A’

2013 CHAR2 RESB 1

2014 END BEGIN

Output – Symbol table

BEGIN 2000

NUM1 200C

NUM2 200F

CHAR1 2012

CHAR2 2013

Pass 2 of a two pass assembler

Step 1: Read the first line from the intermediate file.

Step 2: Check to see if the opcode field in the input line is “START”, if so then write the line

onto the final output file.

Step 3: Repeat the following for the other lines in the intermediate file until the opcode field


1. If there is a symbol in the operand field, then the object code is assembled by

combining the machine code equivalent of the instruction with the symbol address.

2. If there is no symbol in the operand field, then the operand address is assigned

as zero and it is assembled with the machine code equivalent of the instruction.

System Software


3. If the opcode field is BYTE or WORD or RESB, then convert the constants in the

operand filed to the object code.

4. Write the input line along with the object code onto the final output file.

Step 4: Close all the opened files and exit.

Algorithm

Begin

Read first input line {from intermediate file}


Begin

Write listing line


End {if START}

Write Header Record to Object Program

Initialize first Text Record


Begin


Begin


if found then

Begin

if there is a symbol in OPERAND filed then

Begin

Search SYMTAB for OPERAND

if found then

Store symbol value as operand address

else

Begin

Store 0 as operand address

Set error flag (undefined symbol)

System Software


End

End {if symbol}

else

Store 0 as operand address

Assemble the object code instruction

end {if opcode found}

else if OPCODE=’BYTE’ or ‘WORD’ then

convert constant to object code

if object code will not fit into the current text Record then

begin

write Text record to object Program

initialize new text Record

end

add object code to Text Record

end {if not comment}

write listing line

read next input line

end {while not END}

write last Text Record to object program

Write End Record to object program

Write last listing line

End {pas 2}

Example

Input – Assembly Language Program with address

MAIN START 2000

2000 BEGIN LDA NUM1

2003 ** STA NUM2

2006 ** LDCH CHAR1

2009 ** STCH CHAR2

200C NUM1 WORD 5

System Software


200F NUM2 RESW 1

2012 CHAR1 BYTE C’A’

2013 CHAR2 RESB 1

2014 END BEGIN

Input – Symbol table

BEGIN 2000

NUM1 200C

NUM2 200F

CHAR1 2012

CHAR2 2013

Output – Object Code

MAIN START 2000

2000 BEGIN LDA NUM1 00200C

2003 STA NUM2 0C200F

2006 LDCH CHAR1 502012

2009 STCH CHAR2 542015

200C NUM1 WORD 5 000005

200F NUM2 RESW 1

2012 CHAR1 BYTE C’A’ 000041

2013 CHAR2 RESB 1

2014 END BEGIN

2.4 Machine Dependent Assembler Features

Register-to-register instructions are shorter and do not require another memory

reference. Use register-to-register instructions instead of register-to-memory

instructions whenever possible.

Most of register-to-memory instructions are assembled using either program-

counter relative addressing or base relative addressing

System Software


If the required displacement is too large, then the 4-byte extended instruction

format must be used. The programmer must specify the 4-byte format by adding

the prefix + to the operation code in the source statement.

If the required displacement is out of range, the assembler then attempts to use base

relative addressing. If neither form of relative addressing is applicable, then the

instruction cannot be properly assembled and the assembler must generate an error

message.

The assembler directive BASE is used in conjunction with base relative addressing.

Indirect addressing is indicated by adding the prefix @ to the operand. Immediate

addressing is specified with the prefix # to the immediate operands.

There are two important assembler features are:

Instruction Format and Addressing Modes

Program Relocation

(i) Instruction Format and Addressing Modes

Instruction Format: The programmer must specify the 4-byte format by adding the prefix

+ to the operation code in the source statement. If the required displacement is too large,

then the 4-byte extended instruction format must be used. If extended format is not

specified, the assembler may first attempt to translate the instruction using program-

counter relative addressing.

Addressing Modes: Most of register-to-memory instructions are assembled using either

program-counter relative addressing or base relative addressing. Immediate addressing is

specified with the prefix # to the immediate operands. If neither program counter relative

nor base relative addressing can be used, then 4 byte extended instruction format must be

used which consists of 20 bit address filed.

(a) Program counter relative addressing

Consider the assembly language,

LINE LOCCTR LABEL OPCODE OPERANDS

10 0000 FIRST STL RETADR

12 0003 LDB #LENGTH

System Software


13 BASE LENGTH

15 0006 CLOOP +JSUB RDREC

40 0017 J CLOOP

95 0030 RETADR RESW 1

100 0033 LENGTH RESW 1

125 1036 RDREC CLEAR X

133 103C +LDT #4096

160 104E STCH BUFFER, X

Example1: Consider the statement in Line 10. During the execution of instructions on SIC,

the program counter is advanced after the instruction is fetched and before it is executed.

RETADR is assigned the address 0030. Now, calculate the displacement value.

For program counter addressing,

TA= (PC) + Disp

Target address (TA)=RETADR=030 and (PC)=003

(PC) = address of the next instruction of line number 10.

Now displacement = TA-(PC) = 030-003

TA= 030= 0000 0011 0000

(PC)= 003= 0000 0000 0011 (subtract)

Disp= 0000 0010 1101 = 02D

The instruction format for this instruction using program counter addressing is shown

below.

6 1 1 1 1 1 1 12

0001 01 1 1 0 0 1 0 0000 0010 1101

Opcode n i x b p e Disp

1 7 2 0 2 D

Example2: Consider the statement in line 40, Jump to the label CLOOP which is already

defined in the address 0006. For program counter addressing,

TA= (PC) + Disp

Target address (TA)=CLOOP=006 and (PC)=01A

(PC) = address of the next instruction of line number 40.

System Software


Now displacement = TA-(PC) = 006-01A

TA= 006= 0000 0000 0110

(PC)= 01A= 0000 0001 1010 (subtract)

Disp= 1111 1110 1100 = FEC

The instruction format for this instruction using program counter addressing is shown

below.

6 1 1 1 1 1 1 12

0011 11 1 1 0 0 1 0 1111 1110 1100

Opcode n i x b p e Disp

1 7 F E C

(b) Base Relative addressing

Consider the statement in line 160 it stores the value of base register to BUFFER. The base

register which takes the value of LENGTH stored in the address 0033. For this instruction,

the disp will be calculated using base relative addressing:

TA= (B) + Disp

Target address (TA)=BUFFER=036 and (B)=033

Now displacement = TA-(B)

= 036-033=003

(c) Immediate Addressing

Consider the statement in line 12, which stores the value of LENGTH to accumulator. In

immediate addressing, this immediate value is assigned to the displacement field. If the

value is fit into 12 bits then use format 3 type instruction. Otherwise format 4 instruction

type is used.

(ii) Program Relocation

An object program that contains the information necessary to perform the modification is

called a re-locatable program. The assembler can identify for the loader those parts of the

object program that need modification. The memory address of operands should be

modified according to the loaded address, while constant data should remain unchanged. In

order to avoid the re- locatable problem, we use modification record. Program relocation is

needed because of the following reasons:

System Software


It is desirable to load and run several problems at the same time.

The system must be able to load programs into memory wherever there is a room.

The exact starting address of the program is not known until load time.

The modification record

The assembler produces a modification record describing the address and length of an

address field to be modified. The loader will add the beginning address of the loaded

program to the address field specified by a modification record.

Col. 1 M

Col. 2-7 Starting location of the address field to be modified,

relative to the beginning of the program

Col. 8-9 Length of the address field to be modified, in half-

bytes

Example

Figure 2.3 – Program Relocation

System Software


2.5 Machine Independent Assembler Features

• Literals

• Symbol-defining statements

• Expressions

• Program blocks

(i) Literals

It is convenient for the programmer to be able to write the value of a constant operand as a

part of the instruction that uses it. Such an operand is called a literal

45 001A ENDFIL LDA =C‘EOF’ 003210

In this assembler language notation, a literal is identified with the prefix=, which is

followed by a specification of the literal value.

The difference between a literal and an immediate operand

With immediate addressing, the operand value is assembled as a part of the machine

instruction.

55 0020 LDA #3 010003 With a literal, the assembler generates the specified value as a constant at some

other memory location. The address of this generated constant is used as the target

address for the machine instruction.

45 001A ENDFIL LDA=C’EOF’ 032010 Literal pool

• All of the literal operands used in the program are gathered together into one or

more literal pools.

• Normally literals are placed into a pool at the end of the program.

• Sometimes, it is desirable to place literals into a pool at some other location in the

object program.

– LTORG directive is introduced for this purpose.

– When the assembler encounters a LTORG, it creates a pool that contains all of

the literals used since the previous LTORG.

Literal for current value of location counter

The value of the location counter can be denoted by a literal operand *.

System Software


BASE *

LDB =*

The literal =* repeatedly used in the program that have identical names but different

values, and both must be in the literal pool.

Handling duplicate literal

The assembler should avoid storing duplicate literals. The easiest way to recognize

duplicate literals is by comparison of the character strings defining them.

For example,

215 1062 WLOOP TD =X’05’

230 106B WD =X’05’

In this case, literal ‘05’ is repeatedly used. In order to avoid the duplication enter the details

of literal into the literal able LITTAB only once.

Literal table (LITTAB)

The basic data structure needed to process literal operands is a literal table (LITTAB).

LITTAB is often organized as a hash table, using the literal name or value as the key.

which consists of literal name, hexadecimal value, address and value fields.

Example

Literals Hexadecimal

Value

Length Address

C’EOF’ 454F46 3 002D

X’05’ 05 1 1076

Implementation of Literals

Pass 1

For each recognized literal operand, search LITTAB. If the literal is already present

in the table, no action is need; if it is not present, the literal is added to LITTAB

without assigning its address.

When a LTORG statement is encountered or the end of the program, the assembler

makes a scan of LITTAB and assigns an address to each literal.

Update the location counter to reflect the number of bytes occupied by each literal.

System Software


Pass 2

Search LITTAB for each literal operand encountered.

The data values specified by the literals in each literal pool are inserted at the

appropriate places in the object program.

In the same way as these values generated by BYTE or WORD statements.

If a literal value represents an address in the program, the assembler must generate

the appropriate Modification record.

(ii) Symbol-Defining Statements

EQU directives

Assemblers provide an assembler directive EQU that allows the programmer to define the

symbol and specify their values. The general syntax to use the EQU is:

symbol EQU value

When the assembler encounters the EQU statement, it enters “symbol” into SYMTAB with

the value of “symbol”.

Use of EQU

(i) Establish symbolic names that can be used for improved readability in place of numeric.

values.

+LDT #4096

MAXLEN EQU 4096

+LDT #MAXLEN

(ii) Define mnemonic names for registers.

A EQU 0

X EQU 1

L EQU 2

(iii) Establish and use names that reflect the logical function of the registers in the program.

BASE EQU R1

COUNT EQU R2

INDEX EQU R3

System Software


ORG directives

The assembler directive ORG is usually used to indirectly assign values to symbols. The

general syntax to use the ORG directive is:

ORG value

“value” is a constant or an expression involving constants and previously defined symbols.

When this statement is encountered, the assembler resets its location counter (LOCCTR) to

the specified value. Since the values of symbols are taken from LOOCTR, the ORG statement

will affect the values of all labels defined until the next ORG.

Use ORG for label definition

Suppose that we want to define a table with the following structure.

SYMBOL field - 6 bytes

VALUE field - 3 bytes or 1 word

FLAG field - 2 bytes

SYMBOL field contains user defined symbols. VALUE field defines the value assigned to the

symbol and FALG field specifies the symbol type and other information.

SYMBOL VALUE FLAGS

To reserve the space for symbol table, we can write

STAB RESB 1100

Totally, 1100 bytes are reserved for 100 entries.

The EQU directive is used to define the labels for the symbol table such as SYMBOL, VALUE

and FLAG using the following statements:

STAB RESB 1100

SYMBOL EQU STAB

VALUE EQU STAB+6

FLAGS EQU STAB+9

STAB (100 Entries)

System Software


We can fetch the VALUE field from the table entry indicated by the contents of register X

using

LDA VALUE, X

The same symbol definition using ORG is as follows:

STAB RESB 1100

ORG STAB

SYMBOL RESB 6

VALUE RESW 1

FLAGS RESB 2

ORG STAB+1100

The first ORG resets the LOOCTR to the value of STAB and the last ORG set the LLOCCR

back to its previous value.

Restrictions of EQU and ORG in an ordinary two-pass assembler

For an ordinary two-pass assembler, all symbols must be defined during Pass 1. Hence, the

following sequences could not be processed by an ordinary two-pass assembler. All terms

used to specify the value of the new symbol must have been defined previously in the

program.

Example1:

ALPHA EQU BETA

BETA EQU DELTA

DELTA RESW 1 (not valid)

Example2:

ORG ALPHA

BYTE1 RESB 1

BYTE2 RESB 1

BYTE3 RESB 1

ORG

ALPHA RESB 1 (not valid)

System Software


Example3:

BETA EQU ALPHA

ALPHA RESW 1 (not valid)

Example4:

ALPHA RESW 1

BETA EQU ALPHA (valid)

(iii) Expressions

Most assemblers allow the use of expressions whenever a single operand such as a label or

literal is permitted. Each such expression must be evaluated by the assembler to produce a

single operand address or value. Assemblers generally allow arithmetic expressions

formed according to the normal rule using the operators +,-,*, and /. Individual terms in

the expression may be constants, user-defined symbols, or special terms. The most

common special term is the current value of the location counter (often designated by *)

Types of terms

Absolute terms -> The value of an absolute term is independent of program location.

Relative terms -> The value of a relative term is dependent on the beginning address of the

program.

Types of expressions

By the type of value produced, expressions can classified as

Absolute expressions: The value of an absolute expression is independent of the

program location. The absolute expression may contains relative terms provided the

relative terms occur in pairs and the terms in each such pair have opposite signs. No

relative term can enter multiplication or division operation.

•e.g. MAXLEN EQU BUFEND-BUFFER

Relative expressions: The value of a relative expression is relative the beginning

address of the object program. Expressions that are neither relative nor absolute

should be flagged by the assembler as errors. Relative expressions can be written as

S+r

where S is the starting address of the program

r is the relative term related to the beginning of the program.

System Software


(iv) Program Blocks

It refers to segments of code that are rearranged within a single object program unit. The

assembler directive USE is used to define the block for the program statements. The

general syntax for USE directive is:

USE [block name]

Three blocks are used in the assembly language program. They are:

o Unnamed Program block (default block)

o CDATA block

o CBLKS block

At the beginning, statements are assumed to be part of the unnamed (default) block. If no

USE statements are included, the entire program belongs to this single block. Each program

block may actually contain several separate segments of the source program.

Implementation of Program Blocks

Pass 1

Each program block has a separate location counter

Each label is assigned an address that is relative to the start of the block that

contains it

At the end of Pass 1, the latest value of the location counter for each block indicates

the length of that block

The assembler can then assign to each block a starting address in the object

program

Pass 2

The address of each symbol can be computed by adding the assigned block starting

address and the relative address of the symbol to that block

2.6 One Pass Assembler

The one-pass assembler is used if it is necessary and desirable to avoid a second pass over

program. A one-pass assembler scans the program just once. The main problem in trying

to assemble a program in one pass involves forward references. All storage reservation

statements can be defined before they are referenced. But, forward references to labels on

System Software


instructions cannot be eliminated as easily. The logic of the program often needs a forward

jump. The one-pass assembler must make some special provision for handling forward

references.

Two types of one-pass assemblers

1. One type of one-pass assemblers produces object code directly in memory for immediate

execution.

• No object program is written out.

• No loader is needed.

2. The other type of one-pass assemblers produces the usual kind of object program for

later execution.

Load-and-go assembler

The assembler that does not write object program out and does not need a loader is called

a Load -and-go assembler.

• It avoids the overhead of writing the object program out and reading it back in.

• It is useful in a system that is oriented toward program development and testing.

• A load-and-go assembler can be a one-pass assembler or a two-pass assembler.

Handling of forward references in one-pass load-and-go assembler

The assembler generates object code instructions as it scans the source program.

If an instruction operand is a symbol that has not yet been defined,

the symbol is entered into the symbol table with a flag indicating that the symbol is

undefined;

the operand address is omitted when the instruction is assembled;

the operand address is added to a list of forward references associated with the

symbol table entry.

When the definition for a symbol is encountered, the forward reference list for that symbol

is scanned, and the proper address is inserted into any instructions previously generated.

Algorithm

Begin

Read first input line


System Software


Begin

Save #[OPERAND] a starting address

Initialize LOCCTR to starting address



End (if START)

Else

Initialize LOCCTR to 0


Begin


Begin

if there is a symbol in the LABEL filed then

Begin

Search SYMTAB for LABEL

if found then

Begin

if <symbol value> as NULL

set <symbol value> as LOCCTR and search the linked list with

corresponding operand

PTR addresses and generate operand addresses as corresponding

symbol values

set symbol values as LOOCTR in symbol table and delete linked list

End

else

Insert (LABEL, LOCCTR) into SYMTAB

End {if symbol}


if found then

Begin

System Software


Search SYMTAB for OPERAND addresses

If found then

If symbol value not equal to NULL then

Store symbol value as OPERAND address

Else

Insert at the end of the linked list with a node with

address as LOCCTR

Else

Insert (symbol name, NULL)

LOCCTR+=3

End

else if OPCODE=’WORD’ then

Add 3 to LOCCTR and convert comment to object code

else if OPCODE=’RESW’ then

Add 3* # [OPERAND] to LOOCTR

else if OPCODE=’RESB’ then

Add # [OPERAND] to LOOCTR

else if OPCODE=’BYTE’ then

Begin

Find length of constant in bytes

Add length to LOOCTR

Convert constant to object code

End

If object code will not fit into current Text Record then

Begin

Write Text Record to object program initialize new Text Record

End

Add object code to Text Record

End

Write listing line

System Software



End

Write last Text Record to object program

Write End Record to object program

Write last listing line

End {one pass}

Explanation

Step 1: Read the input line.

Step 2: Check to see if the opcode field in the input line is “START”.

1. Find if there is any operand field after START; initialize the LOCCTR to the

operand value.

2. Otherwise if there is no value in the operand field the LOCCTR is set to zero.

Step 3: Write the line onto the output file.

Step 4: Repeat the following for the other lines in the input file until the opcode field


1. If there is a symbol in the label field.

i. Check the symbol table to see if has already been stored and if it is marked as

undefined entry. If so then update the symbol table with the proper address and

mark it as defined entry.

ii. Other wise the symbol is entered into the symbol table along with the memory

address in which it is stored.

2. If there is an opcode in the opcode field

i. Search the OPTAB to see if the opcode is present, if so increment the location

counter (LOCCTR) by three.

ii. a) If the opcode is WORD, increment the LOCCTR by three and convert the

constants in the operand field to the object code.

b) If the opcode is BYTE, increment the LOCTR by one and convert the

constants in the operand field to the object code.

c) If the opcode is RESW, increment the LOCCTR by integer equivalent of the

operand value *3 and convert the constants in the operand field to the object

System Software


code.

d) If the opcode is RESB, increment the LOCCTR by the integer equivalent of the

operand value and convert the constants in the operand field to the object code.

3. If there is a symbol in the operand field.

i. Check the symbol table to see if has already been stored. If so, then assemble

the object code by combining the machine code equivalent of the instruction with

the symbol address.

ii. Other wise the symbol is entered into the symbol table and it is marked as

undefined entry.

4. If there is no symbol in the operand field, then operand address is assigned as zero,

and it is assembled with the machine code equivalent of the instruction.

5. Write the input line along with the object code onto output file.

Step 5: Close all the opened files and exit.

Example

Source Program with object code

System Software


Memory Address Contents

1000 454F4600 00030000 00xxxxxx xxxxxxxx

1010 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

.

.

2000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx14

2010 100948-- --00100C 28100630 ----48--

2020 --3C2012

.

Symbol Table

2.7 Multi Pass Assembler

Multi pass assembler which translate the assembly language program into the

machine code or object code in multiple passes. It is used to eliminate forward references

in symbol definition. It creates a number of passes that is necessary to process the

definition of symbols. It is unnecessary for a multi-pass assembler to make more than two

System Software


passes over the entire program. Instead, only the parts of the program involving forward

references need to be processed in multiple passes. The method presented here can be

used to process any kind of forward references. If we use a two-pass assembler, the

following symbol definition cannot be allowed.

ALPHA EQU BETA

BETA EQU DELTA

DELTA RESW 1

This is because ALPHA and BETA cannot be defined in pass 1. Actually, if we allow multi-

pass processing, DELTA is defined in pass 1, BETA is defined in pass 2, and ALPHA is

defined in pass 3, and the above definitions can be allowed. This is the motivation for using

a multi-pass assembler.

Multi pass assembler uses a symbol table to store symbols that are not totally

defined yet. For a undefined symbol, in its entry, We store the names and the number of

undefined symbols which contribute to the calculation of its value. We also keep a list of

symbols whose values depend on the defined value of this symbol. When a symbol becomes

defined, we use its value to reevaluate the values of all of the symbols that are kept in this

list. The above step is performed recursively.

Example

The following symbols defining statements which involves forward references.

HALFSZ EQU MAXLEN/2

MAXLEN EQU BUFEND-BUFFER

PREVBT EQU BUFFER-1

BUFFER RESB 4096

BUFEND EQU *

These statements would not assign the address for labels within two passes. The following

figure displays the symbol table entries resulting from pass 1 processing the first

statement.

HALFSZ EQU MAXLEN/2

System Software


MAXLEN has not yet been defined so the value for HALFSZ is not computed. Expression for

HALFSZ is stored in the symbol table in place of its value. The entry and 1 indicates that

one symbol in the defining expression is undefined.

For the next statement,

MAXLEN EQU BUFEND-BUFFER

There are two undefined symbols involved in the definitions. They are BUFFEND and

BUFFER. Both of these are entered into SYMTB with lists indicates the dependence of

MAXLEN upon them. The definition of the second statement is shown in the following

figure.

The next figure shows the defining symbol details of the third statement

PREVBT EQU BUFFER-1

System Software


In this case, a new undefined symbol PREVBT is added to the symbol table and it is also

defined from the symbol BUFFER. So, it can be added to the list.

The next figure shows that when BUUFEND is defined, MAXLEN and HALFSZ

The following figure shows the symbol definitions for the statement

BUFFER RESB 4096

In this case, BUFFER symbol is defined. From this symbol definition, PREVBT can be

defined accordingly.

The next figure shows the complete symbol table entries after defining the statement

BUFEND EQU *

In this case, BUFFEND symbol is defined. The current value of LOCCTR will be assigned to

BUFFEND. From this symbol definition, MAXLEN and HALFSZ can be determined

System Software


accordingly. This completes the symbol definition process. If any symbols remains

undefined at the end of the program, the assembler would flag them as errors.

2.8 Implementation Examples – MASM Assembler

An MASM assembler language program is written as a collection of segments. Each

segment is defined as belonging to a particular class, corresponding to its contents.

Commonly used classes are CODE, DATA, CONST and STACK.

Segments are addressed via x86 segment registers during the program execution.

Code segments are addressed using code segment register CS, and stack segments

are addressed using stack segment register SS. Thee segment registers are

automatically set by the system loader when a program is loaded for execution.

Register CS is set to indicate the segment that contains the starting address specified

in the END statement of the program. Data segments including constants segments

are addressed using DS, ES, FS, or GS. The segment register can be specified

explicitly by the programmer. If the programmer does not specify a segment

register, one is selected by the assembler.

By default, the assembler assumes the default register is DS. The register can be

changed using the assembler directive ASSUME. For example,

ASSUME ES: DATASEG1

System Software


tells the assembler to assume the register ES indicates the segment DATASEG1.

Jump instructions are assembled in two different ways, depending on whether the

target of the jump is in the same code segment or in different code segment. A near

jump is a jump to a target in the same code segment; a far jump is a jump to a target

in a different code segment.

A near jump is assembled using current code segment register CS and far jump must

be assembled using different segment register, which is specified in an instruction

prefix. Near jump instruction occupies 2 or 3 bytes of memory (depending upon

whether the jump address s within 128 bytes of the current instruction) where as

far jump occupies 5 bytes of memory.

By default, MASM assumes that a forward jump is near jump. If the target of the

jump is in another code segment, the programmer must warn the assembler by

writing

JMP FAR PTR TARGET

If the jump address is within 128 bytes of the current instruction, the programmer

can specify the shorter (2-bytes) near jump by writing

JMP SHORT TAGET

If the JMP to TARGET is a far jump, and the programmer does not specify PTR, a

problem occurs. During pass 1, the assembler reserves 3 bytes for the jump

instruction. The actual assembled instruction requires 5 bytes. In the earlier version,

it causes an error. In later versions, the MASM assembler can repeat pass 1 to

general location counter values.

The length of the instruction depends on the operands that are used. Immediate

operands may occupy from 1to 4 bytes in the instruction. An operand that specifies

a memory location may take various amount of space in the instruction.

Segments in an MASM can be written in more than one part. If a segment directive

specifies the same name a s a previously defined segment, it is considered to be a

continuation of that segment.

References between the segments are handled by assembler and the external

references are handled by the linker.

System Software


MAM assembler which allows easy and efficient execution of the program in a

variety of operating system environments. It also produce an instruction timing

listing that shows the number of clock cycles required to execute each machine

instruction.

Documents

UNIT II ASSEMBLERS 2.1 Basic assembler functionschettinadtech.ac.in/storage/14-07-01/14-07-01-14-24-40-2638... · [LABEL] COMMENTS ... 2.2 A simple