View
320
Download
3
Category
Preview:
Citation preview
Chapter 2 Assemblers
System SoftwareChih-Shun Hsu
Basic Assembler Functions Convert mnemonic operation codes to their machine
language equivalent Convert symbolic operands to their equivalent machine
addresses Build the machine instructions in the proper format Convert the data constants specified in the source
program into their machine representations Write the object program and the assembly listing
Two Pass Assembler(2/1) Forward reference—a reference to a label that is defined
later in the program Because of forward reference, most assembler make
two pass over the source program The first pass does little more than scan the source
program for label definitions and assign addresses The second pass performs most of the actual translation Assembler directives (or pseudo-instructions) provide
instructions to the assembler itself
Two Pass Assembler(2/2) Pass 1 (define symbols)
Assign addresses to all statements in the program Save the values (addresses) assigned to all labels Perform some processing of assembler directives
Pass 2 (assemble instructions and generate object program) Assemble instructions (translating operation codes and looking
up addresses Generate data values defined by BYTE, WORD, etc. Perform processing of assembler directives not done during
Pass 1 Write the object program and the assembly listing
Assembler Data Structure and Variable Two major data structures:
Operation Code Table (OPTAB): is used to look up mnemonic operation codes and translate them to their machine language equivalents
Symbol Table (SYMTAB): is used to store values (addresses) assigned to labels
Variable: Location Counter (LOCCTR) is used to help the assignment of
addresses LOCCTR is initialized to the beginning address specified in the
START statement The length of the assembled instruction or data area to be
generated is added to LOCCTR
OPTAB and SYMTAB OPTAB must contain the mnemonic operation code and
its machine language In more complex assembler, it also contain information
about instruction format and length For a machine that has instructions of different length,
we must search OPTAB in the first pass to find the instruction length for incrementing LOCCTR
SYMTAB includes the name and value (address) for each label, together with flags to indicate error conditions
OPTAB and SYMTAB are usually organized as hash tables, with mnemonic operation code or label name as the key, for efficient retrieval
Example of a SIC Assembler Language Program (3/1)
Example of a SIC Assembler Language Program (3/2)
for (int i=0; i<4096; i++){ scanf(“%c”,&BUFFER[i]); if (BUFFER[i]==0) break;}LENGTH=i;
Example of a SIC Assembler Language Program (3/3)
for (int i=0; i<LENGTH; i++){ printf(“%c”,BUFFER[i]);}
Program with Object Code (3/1)14 1033
Program with Object Code (3/2)
54 1039+8000=9039
Program with Object Code (3/3)
SYMTAB
symbol value flags
FIRST 1000
CLOOP 1003
ENDFIL 1015
EOF 102A
THREE 102D
ZERO 1030
RETADR 1033
LENGTH 1036
BUFFER 1039
RDREC 2039
RLOOP 203F
EXIT 2057
INPUT 205D
MAXLEN 205E
WRREC 2061
WLOOP 2064
OUTPUT 2079
Object Program Format
Header record (H) Col. 2-7 program name Col. 8-13 Starting address of object program (Hex) Col. 14-19 Length of object program in bytes (Hex)
Text record (T) Col. 2-7 Starting address for object code in this record
(Hex) Col. 8-9 length of object code in this record (Hex) Col 10-69. object code, represented in Hex
End record (E) Col.2-7 address of first executable instruction in object
program (Hex)
Object Program
Algorithm for Pass 1 of Assembler(3/1)read first input lineif OPCODE=‘START’ then
begin save #[OPERAND] as starting address initialize LOCCTR to starting address write line to intermediate file read next input line
endelse
initialize LOCCTR to 0while OPCODE≠’END’ do
begin if this is not a comment line then
begin if there is a symbol in the LABEL field then
Algorithm for Pass 1 of Assembler(3/2)
begin search SYMTAB for LABEL if found then set error flag (duplicate symbol) else insert (LABEL, LOCCTR) into SYMTABend {if symbol}
search OPTAB for OPCODE if found then
add 3 {instruction length} to LOCCTR else if OPCODE=‘WORD’ then
add 3 to LOCCTR else if OPCODE=‘RESW’ then
add 3 * #[OPERAND] to LOCCTR
Algorithm for Pass 1 of Assembler(3/3)
else if OPCODE=‘RESB’ thenadd #[OPERAND] to LOCCTR
else if OPCODE=‘BYTE’ thenbegin find length of constant in bytes add length to LOCCTRend {if BYTE}
elseset error flag (invalid operation code)
end {if not a comment} write line to intermediate file read next input lineend {while not END}
Write last line to intermediate fileSave (LOCCTR-starting address) as program length
Algorithm for Pass 2 of Assembler(3/1)read first input line (from intermediate file)If OPCODE=‘START’ then begin
write listing lineread next input line
end {if START}Write Header record to object programInitialize first Text recordWhile OPCODE≠ ‘END’ do begin
if this is not a comment line then begin
search OPTAB for OPCODEif found then begin
Algorithm for Pass 2 of Assembler(3/2)
if there is a symbol in OPERAND field then begin
search SYMTAB for OPERANDif found then store symbol value as operand addresselse begin store 0 as operand address set error flag (undefined symbol) end
end {if symbol} else store 0 as operand address assemble the object code instruction end {if opcode found}
Algorithm for Pass 2 of Assembler(3/3)
else if OPCODE=‘BYTE’ or ‘WORD’ then convert constant to object codeif object code will not fit into the current Text record then begin write Text record to object program initialize new Text record endadd object code to Text record
end {if not comment}write listing lineread next input line
end {while not END}write last Text record to object programWrite End record to object programWrite last listing line
Machine-Dependent Assembler Features Indirect addressing is indicated by adding the prefix @ to
the operand Immediate operands are denoted with the prefix # The assembler directive BASE is used in conjunction
with base relative addressing The extended instruction format is specified with the
prefix + added to the operation code Register-to-register instruction are faster than the
corresponding register-to-memory operations because they are shorter and because they do not require another memory reference
Example of SIC/XE Program(3/1)
Example of SIC/XE Program(3/2)
Example of SIC/XE Program(3/3)
Program with Object Code (3/1)
Object Code Translation
Line 10: STL=14, n=1, i=1ni=3, op+ni=14+3=17, RETADR=0030, x=0, b=0, p=1, e=0xbpe=2, PC=0003, disp=RETADR-PC=030-003=02D, xbpe+disp=202D, obj=17202D
Line 12: LDB=68, n=0, i=1ni=1, op+ni=68+1=69, LENGTH=0033, x=0, b=0, p=1, e=0xbpe=2, PC=0006, disp=LENGTH-PC=033-006=02D, xbpe+disp=202D, obj=69202D
Line 15: JSUB=48, n=1, i=1ni=3, op+ni=48+3=4B, RDREC=01036, x=0, b=0, p=0, e=1, xbpe=1, xbpe+RDREC=101036, obj=4B101036
Line 40: J=3C, n=1, i=1ni=3, op+ni=3C+3=3F, CLOOP=0006, x=0, b=0, p=1, e=0xbpe=2, PC=001A, disp=CLOOP-PC=0006-001A=-14=FEC(2’s complement), xbpe+disp=2FEC, obj=3F2FEC
Line 55: LDA=00, n=0, i=1ni=1, op+ni=00+1=01, disp=#3003, x=0, b=0, p=0, e=0xbpe=0, xbpe+disp=0003, obj=010003
op(6) n i x b p e disp(12)
op(6) n i x b p e address(20)
Format 3
Format 4
Program with Object Code (3/2)
Object Code Translation
Line 125: CLEAR=B4, r1=X=1, r2=0, obj=B410 Line 133: LDT=74, n=0, i=1ni=1, op+ni=74+1=75, x=
0, b=0, p=0, e=1xbpe=1, #4096=01000, xbpe+address=101000, obj=75101000
Line 160: STCH=54, n=1, i=1ni=3, op+ni=54+3=57, BUFFER=0036, B=0033, disp=BUFFER-B=003, x=1, b=1, p=0, e=0xbpe=C, xbpe+disp=C003, obj=57C003
op(8) r1(4) r2(4)
Program with Object Code (3/3)
SYMTAB
SYMBOL VALUE FLAGS
FIRST 0000
CLOOP 0006
ENDFIL 001A
EOF 002D
RETADR 0030
LENGTH 0033
BUFFER 0036
SYMBOL VALUE FLAGS
RDREC 1036
RLOOP 1040
EXIT 1056
INPUT 105C
WRREC 105D
WLOOP 1062
OUTPUT 1076
Program Relocation The actual starting address of the program is not known
until load time An object program that contains the information necessa
ry to perform this kind of modification is called a relocatable program
No modification is needed: operand is using program-counter relative or base relative addressing
The only parts of the program that require modification at load time are those that specified direct (as opposed to relative) addresses
Modification record Col. 2-7 Starting location of the address field to be modified, rela
tive to the beginning of the program (Hex) Col. 8-9 Length of the address field to be modified, in half-bytes
(Hex)
Examples of Program Relocation
Object Program
Machine-Independent Assembler Features Literals Symbol-defining statements Expressions Program block Control sections and program linking
Program with Additional Assembler Features(3/1)
Program with Additional Assembler Features(3/2)
Program with Additional Assembler Features(3/3)
Literals(2/1) Write the value of a constant operand as a part of the
instruction that uses it Such an operand is called a literal Avoid having to define the constant elsewhere in the
program and make up a label for it A literal is identified with the prefix =, which is followed
by a specification of the literal value Examples of literals in the statements:
45 001A ENDFIL LDA =C’EOF’ 032010 215 1062 WLOOP TD =X’05’ E32011
Literals(2/2) With a literal, the assembler generates the specified value
as a constant at some other memory location The address of this generated constant is used as the
target address for the machine instruction All of the literal operands used in the program are
gathered together into one or more literal pools Normally literals are placed into a pool at the end of the
program A LTORG statement creates a literal pool that contains all
of the literal operands used since the previous LTORG Most assembler recognize duplicate literals: the same
literal used in more than one place and store only one copy of the specified data value
LITTAB (literal table): contains the literal name, the operand value and length, and the address assigned to the operand when it is placed in a literal pool
Symbol-Defining Statements Assembler directive that allows the programmer to define symbols
and specify their values General form: symbol EQU value Line 133: +LDT #4096
MAXLEN EQU 4096+LDT #MAXLEN
It is much easier to find and change the value of MAXLEN Assembler directive that indirect assigns values to symbols ORG
STAB RESB 1100SYMBOL EQU STABVALUE EQU STAB+6FLAGS EQU STAB+9
STAB RESB 1100ORG STAB
SYMBOL RESB 6VALUE RESW 1FLAGS RESW 2
ORG STAB+1100
Expressions Assembler allow arithmetic expressions formed
according to the normal rules using the operator +, -, *, and /
Individual terms in the expression may be constants, user-defined symbols, or special terms
The most common such special term is the current value of the location counter (designed by *)
Expressions are classified as either absolute expressions or relative expressions
Symbol Type ValueRETADR R 0030BUFFER R 0036BUFFEND R 1036MAXLEN A 1000
Program Block(2/1) Program blocks: segments of code that are
rearranged within a single object unit Control sections: segments that are translated into
independent object program units USE indicates which portions of the source program
belong to the various blocks
Block name Block number Address Length(default) 0 0000 0066CDATA 1 0066 000BCBLKS 2 0071 1000
Program Block(2/2) Because the large buffer area is moved to the
end of the object program, we no longer need to used extended format instructions
Program readability is improved if the definition of data areas are placed in the source program close to the statements that reference them
It does not matter that the Text records of the object program are not in sequence by address; the loader will simply load the object code from each record at the indicated address
Example Program with Multiple Program Blocks(3/1)
Example Program with Multiple Program Blocks(3/2)
Example Program with Multiple Program Blocks(3/3)
Program Blocks Traced Through Assembly and Loading Processes
Object Program
Control sections(3/1) References between control sections are called external
references The assembler generates information for each external
reference that will allow the loader to perform the required linking
The EXTDEF (external definition) statement in a control section names symbol, called external symbols, that are define in this section and may be used by other sections
The EXTREF (external reference) statement names symbols that are used in this control section and are defined elsewhere
Control sections(3/2) Define record (D)
Col. 2-7 Name of external symbol defined in this control section
Col. 8-13 Relative address of symbol within this control section (Hex)
Col. 14-73 Repeat information in Col. 2-13 for other external symbols
Refer record (R) Col. 2-7 Name of external symbol referred to in this
control section Col. 8-73 Names of other external reference symbols
Control sections(3/3)
Modification record (revised : M) Col. 2-7 Starting address of the field to be modified,
relative to the beginning of the control section (Hex) Col. 8-9 Length of the field to be modified, in half-
bytes (Hex) Col. 10 Modification flag (+ or -) Col. 11-16 External symbol whose value is to be
added to or subtracted from the indicated field
Example Program with Control Sections(3/1)
Example Program with Control Sections(3/2)
Example Program with Control Sections(3/3)
Object Program(2/1)
Object Program(2/2)
One-Pass Assemblers
Eliminate forward references: require that all such areas be defined in the source program before they are referenced
One-pass assembler: Generate their object code in memory for immediate
execution Load-and-go assembler is useful in a system that is
oriented toward program development and testing
Handle Forward Reference The symbol used as an operand is entered into the
symbol table This entry is flagged to indicate that the symbol is
undefined The address of the operand field of the instruction that
refers to undefined symbol is added to a list of forward references associated with the symbol table entry
When the definition for a symbol is encountered, the forward reference list for that symbol is scanned, and the proper address is inserted into any instructions previously generated
Sample Program for One-Pass assembler(3/1)
Sample Program for One-Pass assembler(3/2)
Sample Program for One-Pass assembler(3/3)
Example of Handling Forward Reference(2/1)
Example of Handling Forward Reference(2/2)
Multi-Pass Assemblers(6/1)
HALFSZ EQU MAXLEN/2MAXLEN EQU BUFFEND-
BUFFERPREVBT EQU BUFFER-1……….BUFFER RESB 4096BUFFEND EQU *
Multi-Pass Assemblers(6/2)
Multi-Pass Assemblers(6/3)
Multi-Pass Assemblers(6/4)
Multi-Pass Assemblers(6/5)
Multi-Pass Assemblers(6/6)
MASM Assembler An MASM assembler language program is written as a
collection of segments Commonly used classes are CODE, DATA, CONST, and
STACK During program execution, segments are addressed via
the x86 segment registers ASSUME tells MASM the contents of a segment register;
a programmer must provide instructions to load this register when the program is executed
A near jump is a jump to a target in the same code segment; a far jump is a jump to a target in a different code segment
SPARC Assembler A SPARC assembler language program is divided into un
its called sections .TEXT Executable instructions .DATA Initialized read/ write data .RODATA Read-only data .BSS Uninitialized data areas
A global symbol is either symbol that is defined in the program and made accessible to others
A weak symbol is similar to a global symbol, but the definition of a weak symbol may be overridden by a global symbol with the same name
SPARC branch instructions are delayed branches: the instruction immediately following a branch instruction is actually executed before the branch is taken
Programmers often place NOP (no-operation) instructions in delay slots
Recommended