Chih-Hung Wang Chapter 2: Assembler (Full) Leland L. Beck,
System Software: An Introduction to Systems Programming (3rd),
Addison-Wesley, 1997. 1
Slide 2
Role of Assembler Source Program Assembler Object Code Loader
Executable Code Linker 2
Slide 3
Chapter 2 -- Outline Basic Assembler Functions
Machine-dependent Assembler Features Machine-independent Assembler
Features Assembler Design Options 3
Slide 4
Introduction to Assemblers Fundamental functions Translating
mnemonic operation codes to their machine language equivalents
Assigning machine addresses to symbolic labels Machine dependency
Different machine instruction formats and codes 4
Slide 5
Example Program (Fig. 2.1) Purpose Reads records from input
device (code F1) Copies them to output device (code 05) At the end
of the file, writes EOF on the output device, then RSUB to the
operating system Program (See Fig. 2.1) 5
Slide 6
SIC Assembly Program (Fig. 2.1) Line numbers (for reference)
Address labels Mnemonic opcode operands comments 6
Slide 7
SIC Assembly Program (Fig. 2.1) Index addressing Indicate
comment lines 7
Slide 8
SIC Assembly Program (Fig. 2.1) 8
Slide 9
Example Program (Fig. 2.1) Data transfer (RD, WD) a buffer is
used to store record buffering is necessary for different I/O rates
the end of each record is marked with a null character (00 16 ) the
end of the file is indicated by a zero-length record Subroutines
(JSUB, RSUB) RDREC, WRREC save link register first before nested
jump 9
Slide 10
Assembler Directives Pseudo-Instructions Not translated into
machine instructions Providing information to the assembler Basic
assembler directives START : Specify name and starting address for
the program END : Indicate the end of the source program, and
(optionally) the first executable instruction in the program. BYTE
: Generate character or hexadecimal constant, occupying as many
bytes as needed to represent the constant. WORD : Generate one-word
integer constant RESB : Reserve the indicated number of bytes for a
data area RESW : Reserve the indicated number of words for a data
area 10
Slide 11
Object Program Header Col. 1H Col. 2~7Program name Col.
8~13Starting address (hex) Col. 14-19Length of object program in
bytes (hex) Text Col.1 T Col.2~7Starting address in this record
(hex) Col. 8~9Length of object code in this record in bytes (hex)
Col. 10~69Object code (69-10+1)/6=10 instructions End Col.1E
Col.2~7Address of first executable instruction (hex) (END
program_name) 11
Slide 12
Fig. 2.3 (Object Program) 12 1033-2038: Storage reserved by the
loader
Slide 13
Assembler Tasks The translation of source program to object
code requires us the accomplish the following functions: Convert
mnemonic operation codes to their machine language equivalents
(e.g. translate STL to 14 - Line 10) Convert symbolic operands to
their equivalent machine addresses format (e.g. translate RETARD to
1033 - Line 10) Build machine instructions in the proper format
Convert the data constants specified in the source program into
their internal machine representations (e.g. translate EOF to
454F46) - Line 80 Write object program and the assembly listing
13
Slide 14
Example of Instruction Assemble Forward reference STCH BUFFER,X
(54) 16 1 (001) 2 (039) 16 549039 14
Slide 15
Forward Reference A reference to a label (RETADR) that is
defined later in the program Solution Two passes First pass: does
little more than scan the source program for label definition and
assign addresses (such as those in the Loc column in Fig. 2.2).
Second pass: performs most of the actual instruction translation
previously defined. 15
Slide 16
Difficulties: Forward Reference Forward reference: reference to
a label that is defined later in the program.
LocLabelOperatorOperand 1000FIRSTSTLRETADR 1003CLOOPJSUBRDREC
1012JCLOOP 1033RETADRRESW1 16
Slide 17
Two Pass SIC Assembler Pass 1 (define symbols) Assign addresses
to all statements in the program Save the addresses assigned to all
labels for use in Pass 2 Perform assembler directives, including
those for address assignment, such as BYTE and RESW Pass 2
(assemble instructions and generate object program) Assemble
instructions (generate opcode and look up addresses) Generate data
values defined by BYTE, WORD Perform processing of assembler
directives not done during Pass 1 Write the object program and the
assembly listing 17
Slide 18
Two Pass SIC Assembler Read from input line LABEL, OPCODE,
OPERAND Pass 1Pass 2 Intermediate file Object codes Source program
OPTAB SYMTAB 18
Slide 19
Assembler Data Structures Operation Code Table (OPTAB) Symbol
Table (SYMTAB) Location Counter (LOCCTR) Source Object Program
Intermediate file Pass 1 Pass 2 OPTAB SYMTAB LOCCTR 19
Slide 20
Location Counter ( LOCCTR) A variable that is used to help in
the assignment of addresses, i.e., LOCCTR gives the address of the
associated label. LOCCTR is initialized to be the beginning address
specified in the START statement. After each source statement is
processed during pass 1, the length of assembled instruction or
data area to be generated is added to LOCCTR. 20
Slide 21
Operation Code Table ( OPTAB) Contents: Mnemonic operation
codes (as the keys) Machine language equivalents Instruction format
and length Note: SIC/XE has instructions of different lengths
During pass 1: Validate operation codes Find the instruction length
to increase LOCCTR During pass 2: Determine the instruction format
Translate the operation codes to their machine language equivalents
Implementation: a static hash table (entries are not normally added
to or deleted from it) Hash table organization is particularly
appropriate 21
Slide 22
SYMTAB Contents: Label name Label address Flags (to indicate
error conditions) Data type or length During pass 1: Store label
name and assigned address (from LOCCTR) in SYMTAB During pass 2:
Symbols used as operands are looked up in SYMTAB Implementation: a
dynamic hash table for efficient insertion and retrieval Should
perform well with non-random keys (LOOP1, LOOP2). COPY1000 FIRST
1000 CLOOP1003 ENDFIL1015 EOF1024 THREE102D ZERO1030 RETADR1033
LENGTH1036 BUFFER1039 RDREC2039 22
Slide 23
Fig. 2.2 (1) Program with Object code 23
Slide 24
Fig. 2.2 (2) Program with Object code 24
Slide 25
Fig. 2.2 (3) Program with Object code 25
Slide 26
Figure 2.1 (Pseudo code Pass 1) 26
Slide 27
Figure 2.1 (Pseudo code Pass 1) 27
Slide 28
Figure 2.1 (Pseudo code Pass 2) 28
Slide 29
Figure 2.1 (Pseudo code Pass 2) 29
Slide 30
SIC/XE Assembly Program indirect addressing immediate
addressing extended format 30
Slide 31
SIC/XE Assembly Program 31
Slide 32
SIC/XE Assembly Program 32
Slide 33
Benefits of SIC/XE Addressing Modes Register-to-register
instructions Shorter than register-to-memory instructions No memory
reference Immediate addressing mode No memory reference. The
operand is already present as part of the instruction Indirect
addressing mode Avoids the needs for another instruction Relative
addressing mode Shorten than the extended instruction Easy program
relocation 33
Slide 34
Considering Instruction Formats START directive specifies a
beginning program address of 0: a relocatable program.
Register-to-register instructions: simply convert the mnemonic name
to their number equivalents OPTAB: for opcodes SYMTAB: preloaded
with register names and their values 34
Considering Addressing Modes PC or base relative addressing
Calculate displacement Displacement must be small enough to fit in
the 12-bit field (- 2048..2047 for PC relative mode, 0..4095 for
base relative mode) Extended instruction format (4-byte) 20-bit
field for direct addressing 36
Slide 37
How Assembler Recognizes the Addressing Mode Extended
format:+op m Indirect addressing: op @m Immediate addressing: op #c
Index addressing: op m,X Relative addressing: op m 1st choice: PC
relative (arbitrarily chosen) 2nd choice: base relative (if
displacement is invalid in PC relative mode) 3rd choice: error
message (if displacement is invalid in both relative modes) 37
PC Relative Addressing Mode Instruction: 10 0000 FIRST STL
RETADR 17202D 12 0003 LDB #LENGTH 69202D : : 95 0030 RETADR RESW 1
(14) 16 1 1 0 0 1 0 (02D) 16 (17) 16 (2) 16 (02D) 16 PC is advanced
after each instruction is fetched and before it is executed. That
is, PC contains the address of the next instruction. disp = (0030)
16 -(0003) 16 = (002D) 16 43
Base Relative Addressing Mode Instruction: 12 0003 LDB #LENGTH
69202D 13 BASE LENGTH : : 100 0033 LENGTH RESW 1 105 0036 BUFFER
RESB 4096 : : 160 104E STCH BUFFER,X 57C003 (54) 16 1 1 1 1 0 0
(003) 16 (57) 16 (C) 16 (003) 16 disp = (0036) 16 -(0033) 16 =
(0003) 16 PC relative is no longer applicable BASE directive
explicitly informs the assembler that the base register will
contain the address of LENGTH (use NOBASE to invalidate) LDB loads
the address of LENGTH into base register during execution 45
Why Program Relocation To increase the productivity of the
machine Want to load and run several programs at the same time
(multiprogramming) Must be able to load programs into memory
wherever there is room Actual starting address of the program is
not known until load time 48
Slide 49
Absolute Program Program with starting address specified at
assembly time In the example of SIC assembly program The address
may be invalid if the program is loaded into some where else.
Instruction: 55 101B LDA THREE 00102D 49 Calculated from the
starting address 1000
Slide 50
Relocatable Program 50
Slide 51
Need to be modified: The address portion of those instructions
that use absolute (direct) addresses. Need not be modified:
Register-to-register instructions (no memory references) PC or
base-relative addressing (relative displacement remains the same
regardless of different starting addresses) What Needs to be
Relocated 51
Slide 52
For Assembler For an address label, its address is assigned
relative to the start of the program (thats why START 0) Produce a
modification record to store the starting location and the length
of the address field to be modified. For loader For each
modification record, add the actual beginning address of the
program to the address field at load time. How to Relocate
Addresses 52
Slide 53
Format of Modification Record One modification record for each
address to be modified The length is stored in half-bytes (20 bits
= 5 half-bytes) The starting location is the location of the byte
containing the leftmost bits of the address field to be modified.
If the field contains an odd number of half-bytes, the starting
location begins in the middle of the first byte. 53
Machine Independent Assembler Features Features are not closely
related to machine architecture. More related to issues about:
Programmer convenience Software environment Common examples:
Literals Symbol-defining statements Expressions Program blocks
Control sections Assembler directives are widely used to support
these features 55
Slide 56
Literals Literal is equivalent to: Define a constant explicitly
and assign an address label for it Use the label as the instruction
operand Why use literals: To avoid defining the constant somewhere
and making up a label for it Instead, to write the value of a
constant operand as a part of the instruction How to use literals:
A literal is identified with the prefix =, followed by a
specification of the literal value 56
Slide 57
Original Program 57
Slide 58
Using Literal 58
Slide 59
Object Program Using Literal The same as before 59
Slide 60
Original Program 60
Slide 61
Using Literal 61
Slide 62
Object Program Using Literal The same as before 62
Slide 63
Literal vs. Immediate Addressing Same: Operand field contains
constant values Difference: Immediate addressing: the assembler put
the constant value as part of the machine instruction Literal: the
assembler store the constant value elsewhere and put that address
as part of the machine instruction 63
Slide 64
Literal Pool All of the literal operands are gathered together
into one or more literal pools. literal pool: At the location where
the LTORG directive is encountered To keep the literal operand
close to the instruction that uses it At the end of the object
program, generated immediately following the END statement 64
Slide 65
Duplicate Literals Duplicate literals: The same literal used
more than once in the program Only one copy of the specified value
needs to be stored For example, =X05 in the example program How to
recognize the duplicate literals Compare the character strings
defining them Easier to implement, but has potential problem (see
next) E.g., =X05 Compare the generated data value Better, but will
increase the complexity of the assembler E.g., =CEOF and =X454F46
65
Slide 66
Problem of Duplicate-Literal Recognition using Character
Strings There may be some literals that have the same name, but
different values For example, the literal whose value depends on
its location in the program The value of location counter denoted
by * BASE * LDB=* The literal =* repeatedly used in the program has
the same name, but different values All this kind of literals have
to be stored in the literal pool 66
Slide 67
Implementation of Literal Data structure: a literal table
LITTAB Literal name Operand value and length Address LITTAB is
often organized as a hash table, using the literal name or value as
the key 67
Slide 68
Implementation of Literal Pass 1 As each literal operand is
recognized Search the LITTAB for the specified literal name or
value If the literal is already present, no action is needed
Otherwise, the literal is added to LITTAB (store the name, value,
and length, but not address) As LTORG or END is encountered Scan
the LITTAB For each literal with empty address field, assign the
address and update the LOCCTR accordingly 68
Slide 69
Implementation of Literal Pass 2 As each literal operand is
recognized Search the LITTAB for the specified literal name or
value If the literal is found, use the associated address as the
operand of the instruction Otherwise, error (should not happen) As
LTORG or END is encountered insert the data values of the literals
in the object program Modification record is generated if necessary
69
Slide 70
Symbol-Defining Statements How to define symbols and their
values Address label The label is the symbol name and the assigned
address is its value FIRST STL RETADR Assembler directive EQU
symbol EQU value This statement enters the symbol into SYMTAB and
assigns to it the value specified The value can be a constant or an
expression Assembler directive ORG ORG value 70
Slide 71
Use of EQU To improve the program readability, avoid using the
magic numbers, make it easier to find and change constant values
+LDT #4096 MAXLEN EQU 4096 +LDT #MAXLEN To define mnemonic names
for registers A EQU 0 X EQU 1 BASE EQU R1 COUNT EQU R2 71
Slide 72
Use of ORG Indirect value assignment: ORG value When ORG is
encountered, the assembler resets its LOCCTR to the specified value
ORG will affect the values of all labels defined until the next ORG
If the previous value of LOCCTR can be automatically remembered, we
can return to the normal use of LOCCTR by simply write ORG 72
Slide 73
Example of Using ORG Data structure SYMBOL: 6 bytes VALUE: 3
bytes (one word) FLAGS: 2 bytes Refer to every field of each entry
73
Slide 74
Not Using ORG We can fetch the VALUE field by LDA VALUE,X X =
0, 11, 22, for each entry 74 Offsets from STAB Less readable and
meaningful
Slide 75
Using ORG Size of field more meaningful Restore the LOCCTR to
its previous value Or only use ORG Set the LOCCTR to STAB 75
Slide 76
Forward reference is not allowed for EQU and ORG. That is, all
terms in the value field must have been defined previously in the
program. The reason is that all symbols must have been defined
during Pass 1 in a two-pass assembler. Allowed Not allowed
Forward-Reference Problem 76
Slide 77
Not allowed Forward-Reference Problem 77
Slide 78
Expressions A single term as an instruction operand can be
replaced by an expression. STAB RESB 1100 STAB RESB 11*100 STAB
RESB (6+3+2)*MAXENTRIES The assembler has to evaluate the
expression to produce a single operand address or value.
Expressions consist of Operator +,-,*,/ (division is usually
defined to produce an integer result) Individual terms Constants
User-defined symbols Special terms, e.g., *, the current value of
LOCCTR 78
Slide 79
Relocation Problem in Expressions Values of terms can be
Absolute (independent of program location) constants Relative (to
the beginning of the program) Address labels * (value of LOCCTR)
Expressions can be Absolute Only absolute terms Relative terms in
pairs with opposite signs for each pair Relative All the relative
terms except one can be paired as described in absolute. The
remaining unpaired relative term must have a positive sign. No
relative terms may enter into a multiplication or division
operation Expressions that do not meet the conditions of either
absolute or relative should be flagged as errors. 79
Slide 80
Absolute Expression Relative term or expression implicitly
represents (S+r) S: the starting address of the program r: value of
the term or expression relative to S For example BUFFER: S+r1
BUFEND: S+r2 The expression, BUFEND-BUFFER, is absolute. MAXLEN =
(S+r2)-(S+r1) = r2-r1 (no S here) MAXLEN means the length of the
buffer area Illegal expressions: BUFEND+BUFFER, 100-BUFFER,
3*BUFFER Values associated with symbols 80
Slide 81
Absolute or Relative To determine the type of an expression, we
must keep track of the types of all symbols defined in the program.
We need a flag in the SYMTAB for indication. 81
Slide 82
Program Blocks Collect many pieces of code/data that scatter in
the source program but have the same kind into a single block in
the generated object program. For example, code block, initialized
data block, un- initialized data block. (Like code, data segments
on a Pentium PC). Advantage: Because pieces of code are closer to
each other now, format 4 can be replaced with format 3, saving
space and execution time. Code sharing and data protection can
better be done. With this function, in the source program, the
programmer can put related code and data near each other for better
readability. 82
Slide 83
Advantages of Using Program blocks To satisfy the contradictive
goals: Separate the program into blocks in a particular order Large
buffer area is moved to the end of the object program Using the
extended format instructions or base relative mode may be reduced.
(lines 15, 35, and 65) Placement of literal pool is easier: simply
put them before the large data area, CDATA block. (line 253) Data
areas are scattered Program readability is better if data areas are
placed in the source program close to the statements that reference
them. 83
Slide 84
Program Block Example Default block. 84
Slide 85
Use the default block. 85
Slide 86
Use the default block. At the beginning of the program,
statements are assumed to be part of the unnamed (default) block.
The default block (unnamed) contains the executable instructions.
The CDATA block contains all data areas that are a few words or
less in length. The CBLKS block contain all data areas that consist
of large blocks of memory. 86
Slide 87
Job of Assembler A program block may contain several separate
segments of the source program. The assembler will (logically)
rearrange these segments to gather together the pieces of each
block. These blocks will then be assigned addresses in the object
program, with the blocks appearing in the same order in which they
were first begun in the source program. The result is the same as
if the programmer had physically rearranged the source statements
to group together all the source lines belonging to each block.
87
Slide 88
Assembler Processing (1) Pass 1: Maintain a separate location
counter for each program block. The location counter for a block is
initialized to 0 when the block is first begun. The current value
of this location counter is saved when switching to another block,
and the saved value is restored when resuming a previous block.
Thus, during pass 1, each label is assigned an address that is
relative to the beginning of the block that contains it. After pass
1, the latest value of the location counter for each block
indicates the length of that block. The assembler then can assign
to each block a starting address in the object program. 88
Slide 89
Assembler Processing (2) Pass 2 When generating object code,
the assembler needs the address for each symbol relative to the
start of the object program (not the start of an individual problem
block) This can be easily done by adding the location of the symbol
(relative to the start of its block) to the assigned block starting
address. 89
Slide 90
Figure 2.12 (a) There is no block number for MAXLEN. This is
because MAXLEN is an absolute symbol. 90
Slide 91
0063+3 91
Slide 92
Symbol Table After Pass 1 92
Slide 93
Object Code in Pass 2 20 00060LDA LENGTH032060 The SYMTAB shows
that LENGTH has a relative address 0003 within problem block 1
(CDATA). The starting address for CDATA is 0066. Thus the desired
target address is 0066 + 0003 = 0069. Because this instruction is
assembled using program counter-relative addressing, and PC will be
0009 when the instruction is executed (the starting address for the
default block is 0), the displacement is 0069 0009 = 60. 93
Slide 94
Advantages Because the large buffer area is moved to the end of
the object program, we no longer need to use format 4 instructions
on line 15, 35, and 65. For the same reason, use of the base
register is no longer necessary; the LDB and BASE have been
deleted. Code sharing and data protection can be more easily
achieved. 94
Slide 95
Object Code (Figure 2.13) Although the assembler internally
rearranges code and data to form blocks, the generated code and
data need not be physically rearranged. The assembler can simple
write the object code as it is generated during pass 2 and insert
the proper load address in each text record. 95
Slide 96
Leave the Job to Loader No code need to be generated for these
two blocks. We just need to reserve space for them. 96
Slide 97
Control Section A control section is a part of the program that
maintains its identity after assembly. Each such control section
can be loaded and relocated independently of the others. (Main
advantage) Different control sections are often used for
subroutines or other logical subdivisions of a program. The
programmer can assemble, load, and manipulate each of these control
sections separately. 97
Slide 98
Program Linking Instructions in one control section may need to
refer to instructions or data located in another control section.
(Like external variables used in C language) Thus, program
(actually, control section) linking is necessary. Because control
sections are independently loaded and relocated, the assembler is
unable to know a symbols address at assembly time. This job can
only be delayed and performed by the loader. We call the references
that are between control sections external references. The
assembler generates information for each external reference that
will allow the loader to perform the required linking. 98
Slide 99
Control Section Example Default control section 99
Slide 100
A new control section 100
Slide 101
A new control section 101
Slide 102
External References Symbols that are defined in one control
section cannot be used directly by another control section. They
must be identified as external references for the loader to handle.
Two assembler directives are used: EXTDEF (external definition)
Identify those symbols that are defined in this control section and
can be used in other control sections. Control section names are
automatically considered as external symbols. EXTREF (external
reference) Identify those symbols that are used in this control
section but defined in other control sections. 102
Slide 103
Code Involving External Reference (1) 150003CLOOP +JSUB RDREC
4B100000 The operand (RDREC) is named in the EXTREF statement,
therefore this is an external reference. Because the assembler has
no idea where the control section containing RDREC will be loaded,
it cannot assemble the address for this instruction. Therefore, it
inserts an address of zero. Because the RDREC has no predictable
relationship to anything in this control section, relative
addressing cannot be used. Instead, an extended format instruction
must be used. This is true of any instruction whose operand
involves an external reference. 103
Slide 104
Code Involving External Reference (2) 1600017+STCH
BUFFER,X57900000 This instruction makes an external reference to
BUFFER. The instruction is thus assembled using extended format
with an address of zero. The x bit is set to 1 to indicate indexed
addressing. 104
Slide 105
Code Involving External Reference (3) 1900028MAXLEN WORD BUFEND
BUFFER 000000 The value of the data word to be generated is
specified by an expression involving two external references. As
such, the assembler stores this value as zero. When the program is
loaded, the loader will add to this data area the address of BUFEND
and subtract from it the address of BUFFER, which then results in
the desired value. Notice the difference between line 190 and 107.
In line 107, EQU can be used because BUFEND and BUFFER are defined
in the same control section and thus their difference can be
immediately calculated by the assembler. 105
Slide 106
Figure 2.16 Program Object Code (1) 106
Slide 107
Figure 2.16 Program Object Code (2) 107
Slide 108
Figure 2.16 Program Object Code (3) 108
Slide 109
External Reference Processing The assembler must remember (via
entries in SYMTAB) in which control section a symbol is defined.
Any attempt to refer to a symbol in another control section must be
flagged as an error unless the symbol is identified (via EXTREF) as
an external reference. The assembler must allow the same symbol to
be used in different control sections. E.g., the conflicting
definitions of MAXLEN on line 107 and 190 should be allowed.
109
Slide 110
Two New Record Types (1) We need two new record types in the
object program and a change in the previous defined modification
record type. Define record Give information about external symbols
that are defined in this control section Refer record List symbols
that are used as external references by this control section.
110
Slide 111
111 Two New Record Types (2)
Slide 112
Revised Modification Record 112
Slide 113
Object Program (Figure 2.17) 113
Slide 114
Program Relocation The modified modification record can still
be used for program relocation. Program name 114
Slide 115
More Restriction on Expression Previously we required that all
of the relative terms in an expression be paired to make the
expression an absolute expression. With control sections, the above
requirement is not enough. We must require that both terms in each
pair must be relative within the same control section. BUFEND-
BUFFER (allowed) because they are defined in the same control
section. On the other hand, RDRED COPY (not allowed) because the
value is unpredictable. How to enforce this restriction When an
expression involves external references, the assembler cannot
determine whether or not the expression is legal. The assembler
evaluates all of the terms it can, combines these to form an
initial expression value, and generates Modification records. The
loader checks the expression for errors and finishes the
evaluation. 115
Slide 116
Assembler Design Options - One and Multi-Pass Assembler So far,
we have presented the design and implementation of a two-pass
assembler. Here, we will present the design and implementation of
One-pass assembler If avoiding a second pass over the source
program is necessary or desirable. Multi-pass assembler Allow
forward references during symbol definition. 116
Slide 117
One-Pass Assembler The main problem is about forward reference.
Eliminating forward reference to data items can be easily done.
Simply ask the programmer to define variables before using them.
However, eliminating forward reference to instruction cannot be
easily done. Sometimes your program needs a forward jump. Asking
your program to use only backward jumps is too restrictive.
117
Slide 118
Program Example 118
Slide 119
119
Slide 120
All variables are defined before they are used. 120
Slide 121
Two Types of One-pass Assembler There are two types of one-pass
assembler: Produce object code directly in memory for immediate
execution No loader is needed Load-and-go for program development
and testing Good for computing center where most students
reassemble their programs each time. Can save time for scanning the
source code again Produce the usual kind of object program for
later execution 121
Slide 122
Internal Implementation The assembler generate object code
instructions as it scans the source program. If an instruction
operand is a symbol that has not yet been defined, the operand
address is omitted when the instruction is assembled. The symbol
used as an operand is entered into the symbol table. This entry is
flagged to indicate that the symbol is undefined yet. 122
Slide 123
Internal Implementation (contd) The address of the operand
field of the instruction that refers to the undefined symbol is
added to a list of forward references associated with the symbol
table entry. When the definition of the symbol is encountered, the
forward reference list for that symbol is scanned, and the proper
address is inserted into any instruction previously generated.
123
Slide 124
Processing Example After scanning line 40 124
Slide 125
Processing Example (contd) After scanning line 160 125
Slide 126
Processing Example (contd) Between scanning line 40 and 160: On
line 45, when the symbol ENDFIL is defined, the assembler places
its value in the SYMTAB entry. The assembler then inserts this
value into the instruction operand field (at address 201C). From
this point on, any references to ENDFIL would not be forward
references and would not be entered into a list. At the end of the
processing of the program, any SYMTAB entries that are still marked
with * indicate undefined symbols. These should be flagged by the
assembler as errors. 126
Slide 127
Multi-Pass Assembler If we use a two-pass assembler, the
following symbol definition cannot be allowed. ALPHA EQU BETA BETA
EQU DELTA DELTA RESW1 This is because ALPHA and BETA cannot be
defined in pass 1. Actually, if we allow multi-pass processing,
DELTA is defined in pass 1, BETA is defined in pass 2, and ALPHA is
defined in pass 3, and the above definitions can be allowed. This
is the motivation for using a multi-pass assembler. 127
Slide 128
Multi-Pass Assembler(contd) It is unnecessary for a multi-pass
assembler to make more than two passes over the entire program.
Instead, only the parts of the program involving forward references
need to be processed in multiple passes. The method presented here
can be used to process any kind of forward references. 128
Slide 129
Multi-Pass Assembler Implementation Use a symbol table to store
symbols that are not totally defined yet. For a undefined symbol,
in its entry, We store the names and the number of undefined
symbols which contribute to the calculation of its value. We also
keep a list of symbols whose values depend on the defined value of
this symbol. When a symbol becomes defined, we use its value to
reevaluate the values of all of the symbols that are kept in this
list. The above step is performed recursively. 129
Slide 130
Forward Reference Example 130 LOC:1034
Slide 131
Forward Reference Processing After first line Not defined yet
But one symbol is unknown yet Defined 131
Slide 132
After second line Now defined But two symbols are unknown yet
132
Slide 133
After third line 133
Slide 134
After 4th line Start knowing values 134
Slide 135
After 5th line All symbols are defined and their values are
known now. Start knowing values 135