Assembler 1

Preview:

Citation preview

CS 9303 SYSTEM SOFTWARE INTERNALS

V.P JAYA CHITRA

Computer Technology Dept

Course Objective

This course aids the learners to understand the basic functions of Software components, viz. Assemblers ,Loaders Linkers, Macro processors and Compilers. Also discusses about design and implementation of Assemblers and Macro processor with examples. It then Introduces the concept of Virtual machine with object-oriented features supported. The performance of Emulation Techniques were also analyzed. As a prerequisite the learner should have had some exposure to elementary data structures and Assembly language.

SCOPE

At the end of the course, the learners will be able to: Design and Implement Assemblers. Understand and Analyze the features of Loaders and Linkers. Design and implement Macro processor. Understand about the design and operations of Compilers. Analyze the implementation of Virtual machine by supporting

object – oriented programming features. Analyze the performance of emulation techniques

UNIT PLAN

UNIT 1

Unit Plan

Title Sessions

Machine Instructions and programs Session 1

Assemblers –Basic Assemblers functions Session 2

Simple SIC Assembler Session 3

Assembler algorithm and data structures Session 4

Machine-dependent Assembler features

i)Instruction formats and Addressing modes

Session 5

Contd…

Title Sessions

Machine-dependent Assembler features

ii)Program Relocation

Session 6

Machine-independent Assembler features

i)Litrerals ii)Statements iii)Expressions

Session 7

Machine-independent Assembler features iv)Program blocks v)Control sections and Program Linking

Session 8

Unit 1:Review of Computer Architecture

Unit 1:Review of Computer Architecture

Objective: In this unit the basic concepts of program assembly is explained

using SIC machine. This begins with the discussion of the relationships between system software and machine Architecture. The assemblers machine-dependent and Machine-Independent features is also discussed. The essentials of a one and two-pass assembler is also presented. As a result this unit aids in design and implementation of an Assembler.

Machine Instructions and programs

Session 1

Introduction to system software

Software Application software usually used by end-user

• Concerned with the solution of some problem, using the computer as a tool.

System software• System software consists of a variety of programs that support

the operation of a computer.• Acts as an intermediary between users and hardware.• Creates a virtual environment for the user that hides the actual

computer architecture.• Virtual Machine: Set of services and resources created by

the system software and seen by the user.• The characteristic in which most system software differ from

application software is machine dependency.

System software…

SystemSoftware

HardwareInterface B

Actual Machine Interface

Virtual Machine Interface

Interface A

Virtual Machine

Figure 1.1 The Role of System Software

System software…

components Language Services

Write programs in a high-level, user-oriented language, and then execute them –i.e Translator

• assembler• compiler• interpreter

Memory managers

Allocate and retrieve memory space• loader• linker

other utilities

Collections of library routines that provide services either to user or system routines.

• DBMS, editor, debugger, ...

System software…

Compiler : Translates high-level language to assembly language.

Assembler : Translates assembly language to machine language

(object files).Linker :

Builds an executable file from a collection of object files.

Loader:Reads instructions from the object file and stores them

into memory for execution.

Issues in System Software

Advanced architectures complicates system software

Superscalar CPU Memory model Multiprocessor

New applications

Embedded systems Mobile computing

Machine Instructions and programs

Instruction Set– Load and store registers

LDA, LDX, STA, STX, etc.– Integer arithmetic operations

ADD, SUB, MUL, DIV All arithmetic operations involve register A and a word in memory, with the result

being left in A– COMP– Conditional jump instructions

JLT, JEQ, JGT– Subroutine linkage

JSUB, RSUB– I/O (transferring 1 byte at a time to/from the rightmost 8 bits of register A)

Test Device instruction (TD) Read Data (RD) Write Data (WD)

Basic Assembler Functions

Session 2

Introduction to Assemblers

Assembler Functions: Translating mnemonic operation codes to their machine

language equivalents.• mnemonic code to machine code

Assigning machine addresses to symbolic labels.• symbols to addresses

Handles• Constants• Literals• Addressing

Assembly language: A symbolic representation of machine instructions.

Assemblers…

Assembler Linker

Loader

Source Program

Object

Code

Executable Code

Figure 1.2 Compilation pipeline

Assemblers…

Basic assembler directives: START : Starting address of the program END : Indicate the end of the program BYTE : To represent the constant WORD : Generate one-word integer constant RESB : Reserve the indicated number of bytes

for a data area. RESW : Reserve the indicated number of words

for a data area.

SIC Assembler

Assembler Functions: Convert Mnemonic Operation Codes to Machine Level Equivalents.

– Mnemonic code (or instruction name) opcode. Convert Symbolic Operands to their equivalent machine addresses.

(Requires Two passes).– Symbolic operands (e.g., variable names) addresses.

Build the machine instructions in the proper format Convert data constants specified in source program into their internal

machine representations.– Constants Numbers.

To write Object Program and assembly listing.

SIC Assembler

Session 3

SIC Assembler…

Issues :Address translation

– Contains forward reference• Reference to label that is defined later in the program.

– Requires two passes• label definitions and assign addresses • actual translation (object code)

Example Program with Object Code

Line Loc Source statement Object code

5 1000 COPY START 100010 1000 FIRST STL RETADR 14103315 1003 CLOOP JSUB RDREC 48203920 1006 LDA LENGTH 00103625 1009 COMP ZERO 28103030 100C JEQ ENDFIL 30101535 100F JSUB WRREC 48206140 1012 J CLOOP 3C100345 1015 ENDFIL LDA EOF 00102A50 1018 STA BUFFER 0C103955 101B LDA THREE 00102D60 101E STA LENGTH 0C103665 1021 JSUB WRREC 48206170 1024 LDL RETADR 08103375 1027 RSUB 4C000080 102A EOF BYTE C’EOF’ 454F4685 102D THREE WORD 3 00000390 1030 ZERO WORD 0 00000095 1033 RETADR RESW 1100 1036 LENGTH RESW 1105 1039 BUFFER RESB 4096110 . 115 . SUBROUTINE TO READ RECORD INTO BUFFER

Fig. 1.3 Example Program

Contd..

Line Loc Source statement Object co120 . 125 2039 RDREC LDX ZERO 041030130 203C LDA ZERO 001030135 203F RLOOP TD INPUT E0205D140 2042 JEQ RLOOP 30203D145 2045 RD INPUT D8205D150 2048 COMP ZERO 281030155 204B JEQ EXIT 302057160 204E STCH BUFFER,X 549039165 2051 TIX MAXLEN 2C205E170 2054 JLT RLOOP 38203F175 2057 EXIT STX LENGTH 101036180 205A RSUB 4C0000185 205D INPUT BYTE X’F1’ F1190 205E MAXLEN WORD 4096 001000195 .200 . SUBROUTINE TO WRITE RECORD FROM BUFFER 205 .210 2061 WRREC LDX ZERO 041030215 2064 WLOOP TD OUTPUT E02079220 2067 JEQ WLOOP 302064 225 206A LDCH BUFFER,X 509039230 206D WD OUTPUT DC2079235 2070 TIX LENGTH 2C1036240 2073 JLT WLOOP 382064 245 2076 RSUB 4C0000 250 2079 OUTPUT BYTE X’05’ 05 255 END FIRST Fig. 1.4 Example Program

Object code…

Purpose– Reads records from input device (code F1)– Copies them to output device (code 05)– At the end of the file, writes EOF on the output device, then RSUB

to the operating system

Data transfer (RD, WD)– A buffer is used to store record – Buffering is necessary for different I/O rates– The end of each record is marked with a null character (0016)

– The end of the file is indicated by a zero-length record

Subroutines (JSUB, RSUB)– RDREC, WRREC– Save link register first before nested jump

Object Program

The generated object code of an assembler . The Object program format contains three types of records:

Header Contains program name, start address and length.

Text Contains Translated code and data of the program with

addresses (where to be loaded)

End Specifies the end of the Object program Address of first executable instruction

Object Program

Header record:Col. 1 HCol. 2-7 Program nameCol. 8-13 Starting address (hex)Col. 14-19 Length of object program in bytes (hex)

Text record:Col.1 TCol.2-7 Starting address in this record (hex)Col. 8-9 Length of object code in this record in bytes (hex)Col. 10-69 Object code (69-10+1)/6=10 instructions

End record:Col.1 ECol.2-7 Address of first executable instruction (hex)

Fig 1.5 Object Program

Contd…

Pass 1 (define symbols)

H COPY 001000 00107AT 001000^1E^141033^482039^001036^281030^301015^482061 ...T 00101E^15^0C1036^482061^081044^4C0000^454F46^000003^000000T 002039^1E^041030^001030^E0205D^30203F^D8205D^281030 …T 002057^1C^101036^4C0000^F1^001000^041030^E02079^302064 …T 002073^07^382064^4C0000^05E 001000 starting address

Fig 1.6 Object program Corresponding to Fig 1.3, Fig 1.4

Symbol used to separate fields

Contd…

Pass 1(define symbols)1. Assign addresses to all statements in the program

2. Save the values assigned to all labels for use in Pass 2

3. Perform some processing of assembler directives

Pass 2(assemble instructions and generate object program)1. Assemble instructions

2. Generate data values defined by BYTE, WORD

3. Perform processing of assembler directives not done in Pass 1

4. Write the object program and the assembly listing

Assembler Algorithm and Data Structures

SESSION 4

Assembler Algorithm and Data Structures

OPTAB (operation code table)– mnemonic, machine code (instruction format, length) etc.– static table– instruction length– array or hash table, easy for search

SYMTAB (symbol table)– label name, value, flag, (type, length) etc.– dynamic table (insert, delete, search)– hash table, non-random keys, hashing function

Location Counter– counted in bytes

Contd…

Pass 1 Pass 2 Intermediate

file

Sourceprogram

Object code

Optab Symtab Symtab

Algorithm for pass1 assembler

Contd…

Contd…

Contd…

Contd…

Algorithm for pass 2 Assembler

Contd…

Assembler Features

SESSION 5

Assembler Features

Machine Dependent Assembler Features– instruction formats and addressing modes– program relocation

Machine Independent Assembler Features– literals– symbol-defining statements– expressions– program blocks– control sections and program linking

Instruction Format and Addressing Mode

Addressing Modes:

Extended format: +op m Indirect addressing: op @m Immediate addressing: op #c Index addressing: op m,X Relative addressing: op m

Instruction Format and Addressing Mode

START directive specifies a beginning program address of 0: a relocatable program.

Register-to-register instructions: simply convert the mnemonic name to their number equivalents

– OPTAB: for opcodes– SYMTAB: preloaded with register names and their values

Fetch a value stored in a register is much faster than fetch it from the memory - Improves ececution speed.

Contd…

PC or base relative addressing– Calculate displacement– Displacement must be small enough to fit in the 12-bit field

(-2048..2047 for PC relative mode, 0..4095 for base relative mode)

– Can save one byte from using format 3 rather than format 4. Reduce program storage space Reduce program instruction fetch time

– Relocation will be easier. Extended instruction format (4-byte)

– 20-bit field for direct addressing

Contd…

Immediate addressing mode is used whenever possible.– Operand is already included in the fetched instruction.

There is no need to fetch the operand from the memory. Indirect addressing mode is used whenever possible.

– Just one instruction rather than two is enough.

Examples:

Relocatable programs

Starting address is 0.Register to register instructions

Simple addressing

Use extended format instructions (bit e = 1).

15 0006 CLOOP +JSUB RDREC 4B101036

125 1036 RDREC CLEAR X B410150 1049 COMPR A,S A004

5 0000 COPY START 0

Contd…

PC-relative

RETADR (0030) – 3 = 2D.

Bits p, n, & i = 1(set to 1).

Operand address is 0006, PC is 0001A.

Displacement is 6 – 1A = –14 (FEC in 2’s complement).

40 0017 J CLOOP 3F2FEC

10 0000 FIRST STL RETADR 17202D

Contd…

Base relative:

Declare value of base register.Address of identifier LENGTH (0033).

Directives BASE& NOBASE do not generate code.

Address of BUFFER is 0036.Contents of BASE are 0033.

Displacement 0036- 0033= 0003. Note: Bits x& b are 1.

12 LDB #LENGTH13 BASE LENGTH

160 104E STCH BUFFER,X 57C003

Contd…

Immediate addressing

Operand (= 3) part of instruction.Bit i = 1, indicates immediate addressing.

Operand (4096) > 12 bits.“+” char indicates extended format (bit e = 1).

Directive “#” is address-of operator.

55 0020 LDA #3 01003

133 103C +LDT #4096 75101000

12 0003 LDB #LENGTH 69202D

Program Relocation

SESSION 7

Program Relocation

Absolute Program :Program with starting address specified at assembly time.

Program relocation: Programs with absolute addresses must be loaded at a specific

starting address. so that they can be loaded and execute correctly at any place in the memory. The address may be invalid if the program is loaded into some where else.

To have relocatable programs• Assembler identifies object records that must be modified.• Loader modifies these records.

Contd…

Need for Program Relocation:

To increase the productivity of the machine

Want to load and run several programs at the same time (multiprogramming)

Must be able to load programs into memory wherever there is room

Actual starting address of the program is not known until load time

Contd…

Example :

Consider the following instructions

Instruction “+JSUB RDREC”

Instruction “STL RETADR”

Assembler inserts address of RDREC relative to start of program.

Assembler instructs loader to add program’s beginning address to address of field in JSUB instruction at load time.

Contd…

Modification Record: When the assembler generate an address for a symbol, the

address to be inserted into the instruction is relative to the start of the program.

The assembler also produces a modification record, in which the address and length of the need-to-be-modified address field are stored.

The loader, when seeing the record, will then add the beginning address of the loaded program to the address field stored in the record.

Contd…

Instructions need to be modified:The address portion of those instructions that use absolute (direct) addresses.

Instructions need not be modified:• Immediate addressing (no memory references)• Register-to-register instructions (no memory

references)• PC or base-relative addressing (relative displacement

remains the same regardless of different starting addresses)

Contd…

Modification RecordCol. 1 MCol. 2–7 Starting location of the address field to be modified,

relative to the beginning of the program (hex)Col. 8–9 Length of the address field to be modified in half-bytes.

Example—JSUB RDREC Instruction Instruction “JSUB RDREC” assembles into 4B101036. Starts at address 0006. Modification record M00000705.

Load address to be added to field at relative address, 00007.Field to be modified is 5 half-bytes long (20 bits).

Contd…

Fig 1.6 Examples of Relocation Program

Machine Independent Feature

SESSION 7

Literals

Literal Operand whose value appears literally (constant) in instruction.

• Identified by the prefix “=” ‘C’ chars (1 per byte); ‘X’ hexadecimals (2 per byte). Assembler defines constant in memory.Operand becomes reference to this location.

Literal poolsLiterals are assembled into literal pools.LTORG creates literal pool and inserts accumulated literals.Ensures short addresses are valid.

Duplicate LiteralsAssembler must recognize duplicate literals and store only one copy of the specified data value .Special literals (e.g., =*) must be duplicated.

Literal - Implementation

LITTAB Literal name, the operand value and length, the address assigned to the operand

Pass 1 • Build LITTAB with literal name, operand value and length, leaving

the address unassigned • When LTORG statement is encountered, assign an address to

each literal not yet assigned an address Pass 2

• search LITTAB for each literal operand encountered • generate data values using BYTE or WORD statements • generate modification record for literals that represent an address

in the program

Symbols

Labels on instructions or data areas

EQU Directive

symbol EQU value

Creates entry in symbol table (SYMTAB) & assigns value to it.

Value may be expression involving constants and symbols previously defined.

ORG Directive

ORG value

Resets LOCCTR to value specified.

Contd…

ExamplesSimple constants

MAXLEN EQU 4096. . .+LDT #MAXLEN

Array of recordsSTAB RESB 1100ORG STABSYMBOL RESB 6VALUE RESB 1FLAGS RESB 2ORG STAB+1100. . .LDA VALUE,X

For an ordinary two-pass assembler, all symbols must be defined during Pass 1. Hence, the following sequences could not be processed by an ordinary two-pass assembler.

All terms used to specify the value of the new symbol must have been defined previously in the program.

BETA EQU ALPHA

ALPHA RESW 1

Disallowed

ORG ALPHA

BYTE1 RESB 1

BYTE2 RESB 1

BYTE3 RESB 1

ORG

ALPHA RESB 1

Disallowed

ALPHA RESW 1BETA EQU ALPHAAllowed

Expressions

Expression may use constants, user-defined terms, special terms.• Location counter is one such special term.

Expressions can be classified as absolute expressions or relative expressions Absolute vs. Relative Expressions

An absolute expression is independent of program location.• Expressions that only contain absolute terms are absolute.• The difference of two relative terms is absolute.• Expressions with pairs of relative terms with opposite signs are absolute.

The absolute expression may contains relative terms provided the relative terms occur in pairs and the terms in each such pair have opposite signs. No relative term can enter multiplication or division operation.

e.g. MAXLEN EQU BUFEND-BUFFER

Contd…

A relative expression depends on program location. The value of a relative expression is relative to the beginning address of the object

program. All of the relative terms except one have opposite signs. The remaining relative term is positive.

A relative expression is one in which all of the relative terms except

one can be paired as described above. The remaining unpaired term

must have a positive sign. No relative term can enter multiplication or

division operation.

BUFEND+BUFFER, 100-BUFFER, and 3*BUFFER are neither relative expressions nor absolute expressions.

Expressions that are neither relative nor absolute should be flagged by the assembler as errors. Symbol table entries must be tagged as relative or absolute.

Contd…

ExampleConsider some of the symbols

RETADR RESW 1LENGTH RESW 1BUFFER RESB 4096BUFEND EQU *MAXLEN EQU BUFFEND-BUFFER

Symbol Type ValueRETADR R 0030LENGTH R 0033BUFFER R 0036BUFEND R 1036MAXLEN A 1000

Program Blocks

Definition• Code segments that are rearranged within a single object program unit.

Control Sections• Code segments that are translated into independent object program units.

USE DirectiveUSE [ Block_Name]

Indicates which portions of program belong to various blocks:• Default unnamed block, or• Named block.

Used to reduce addressing problems in a program.Rearranged at link time or load time.

If no USE statements are included, the entire program belongs to this single block unit.

Program Blocks - Implementation 

Pass 1 • Each program block has a separate location counter .• Each label is assigned an address that is relative to the start of the

block that contains it .• At the end of Pass 1, the latest value of the location counter for

each block indicates the length of that block .• The assembler can then assign to each block a starting address

in the object program . Pass 2

• The address of each symbol can be computed by adding the assigned block starting address and the relative address of the symbol to that block .

Contd…

Each source line is given a relative address assigned and a block numberExample

Block TableBlock Name Name Address Length(default) 0 0000 0066CDATA 1 0066 000BCBLKS 2 0071 1000

Program Linking

Control Sections Code segments translated into independent object program units. Each section can be loaded & relocated independently. A section is made one or more related routines. Sections must be linked together to form a program.

CSECT Directive

label CSECT Starts and names a new control section.

External Definition and References

External definition

EXTDEF name [, name]

EXTDEF names symbols that are defined in this control section and may be used by other sections

External reference

EXTREF name [,name]

EXTREF names symbols that are used in this control section and are defined elsewhere

Contd…

EXTREF Directive

EXTREF symbol(,symbol)*

EXTDEF symbol(,symbol)* Example

15 0003 CLOOP +JSUB RDREC 4B100000

160 0017 +STCH BUFFER,X 57900000

190 0028 MAXLEN WORD BUFEND-BUFFER 000000

Implementation

The assembler must include information in the object program that will cause the loader to insert proper values where they are required

Object File RecordsDefine record

– Col. 1 D– Col. 2-7 Name of external symbol defined in this control section– Col. 8-13 Relative address within this control section (hexadeccimal)– Col.14-73 Repeat information in Col. 2-13 for other external symbolsRefer record– Col. 1 D– Col. 2-7 Name of external symbol referred to in this control section– Col. 8-73 Name of other external reference symbols

Contd…

Modification record (New & Improved)– Col. 1 M– Col. 2-7Starting address of the field to be modified (hexiadecimal)– Col. 8-9Length of the field to be modified, in half-bytes

(hexadeccimal)– Col. 10 Modification flag (+ or –).– Col.11-16 External symbol whose value is to be added to or

subtracted from the indicated field

Note: control section name is automatically an external symbol, i.e. it is available for use in Modification records.

Assembler Design

Assembler Design can be done in: Single pass Two pass

One Pass Assembler: Does everything in single pass Cannot resolve the forward referencing

Contd…

Multi pass assembler:Does the work in two pass

Resolves the forward references First pass:

• Scans the code• Validates the tokens• Creates a symbol table

Second Pass:• Solves forward references• Converts the code to the machine code

One Pass Assembler

Problems in One-pass assemblerForward references to Data items Forward references to labels on instructions

SolutionRequire all such areas be defined before they are referencedLabels on instructions: no good solution

Two types of one-pass assembler Load-and-go

Produce code for immediate execution. The other

Produce code for later execution

Load-and-go Assembler

Characteristics • Useful for program development and testing • Avoids the overhead of writing the object program out and

reading it back • Both one-pass and two-pass assemblers can be designed

as load-and-go. • However one-pass also avoids the over head of an

additional pass over the source program • For a load-and-go assembler, the actual address must be

known at assembly time, we can use an absolute program

Multi-Pass Assemblers

Restriction on EQU and ORG • No forward reference, as symbol’s value can’t be defined during

the first pass .

Example:

ALPHA EQU BETA

BETA EQU DELTA

DELTA RESW 1• Assemblers with 2 passes cannot resolve .

Contd…

Resolve forward references with as many passes as needed• Portions that involve forward references in symbol

definition are saved during Pass 1.• Additional passes through stored definitions.• Finally a normal Pass 2.

Example implementation:• Use link lists to keep track of whose value depend

on an undefined symbol.

Implementation example:Microsoft MASM Assembler

SEGMENT– a collection segments, each segment is defined as

belonging to a particular class, CODE, DATA, CONST, STACK

– registers: CS (code), SS (stack), DS (data), ES, FS, GS– similar to program blocks in SIC

ASSUME– e.g. ASSUME ES:DATASEG2

– e.g. MOVE AX, DATASEG2

MOVE ES,AX

– similar to BASE in SIC

Contd…

JUMP with forward reference• near jump: 2 or 3 bytes• far jump: 5 bytes• e.g. JMP TARGET• Warning: JMP FAR PTR TARGET

• Warning: JMP SHORT TARGET

• Pass 1: reserves 3 bytes for jump instruction• phase error

PUBLIC, EXTRN• similar to EXTDEF, EXTREF in SIC

Recommended