Systems Programming Assignment

Master of Computer Application (MCA) – Semester 3

Systems ProgrammingAssignment Set – 1

Que: 1. What is CISC & RISC? Explain their addressing modes.

Ans:

CISC:

A Complex Instruction Set Computer (CISC) supplies a large number of complex instructions at the assembly language level. Assembly language is a low-level computer programming language in which each statement corresponds to a single machine instruction. CISC instructions facilitate the extensive manipulation of low-level computational elements and events such as memory, binary arithmetic, and addressing. The goal of the CISC architectural philosophy is to make microprocessors easy and flexible to program and to provide for more efficient memory use.

The CISC philosophy was unquestioned during the 1960s when the early computing machines such as the popular Digital Equipment Corporation PDP 11 family of minicomputers were being programmed in assembly language and memory was slow and expensive.

CISC machines merely used the then-available technologies to optimize computer performance. Their advantages included the following: (1) A new processor design could incorporate the instruction set of its predecessor as a subset of an ever-growing language–no need to reinvent the wheel, code-wise, with each design cycle. (2) Fewer instructions were needed to implement a particular computing task, which led to lower memory use for program storage and fewer time-consuming instruction fetches from memory. (3) Simpler compilers sufficed, as complex CISC instructions could be written that closely resembled the instructions of high-level languages. In effect, CISC made a computer’s assembly language more like a high-level language to begin with, leaving the compiler less to do.

Some disadvantages of the CISC design philosophy are as follows: (1) The first advantage listed above could be viewed as a disadvantage. That is, the incorporation of older instruction sets into new generations of processors tended to force growing complexity. (3) Many specialized CISC instructions

Systems Programming – MC0073 Roll No. 521150974

were not used frequently enough to justify their existence. The existence of each instruction needed to be justified because each one requires the storage of more microcode at in the central processing unit (the final and lowest layer of code translation), which must be built in at some cost. (4) Because each CISC command must be translated by the processor into tens or even hundreds of lines of microcode, it tends to run slower than an equivalent series of simpler commands that do not require so much translation. All translation requires time. (4) Because a CISC machine builds complexity into the processor, where all its various commands must be translated into microcode for actual execution, the design of CISC hardware is more difficult and the CISC design cycle correspondingly long; this means delay in getting to market with a new chip.

The terms CISC and RISC (Reduced Instruction Set Computer) were coined at this time to reflect the widening split in computer-architectural philosophy.

RISC:

The Reduced Instruction Set Computer, or RISC, is a microprocessor CPU design philosophy that favors a simpler set of instructions that all take about the same amount of time to execute. The most common RISC microprocessors are AVR, PIC, ARM, DEC Alpha, PA-RISC, SPARC, MIPS, and IBM’s PowerPC.

· RISC characteristics

- Small number of machine instructions : less than 150

- Small number of addressing modes : less than 4

- Small number of instruction formats : less than 4

- Instructions of the same length : 32 bits (or 64 bits)

- Single cycle execution

- Load / Store architecture

- Large number of GRPs (General Purpose Registers): more than 32

- Hardwired control

- Support for HLL (High Level Language).

RISC and x86

However, despite many successes, RISC has made few inroads into the desktop PC and commodity server markets, where Intel’s x86 platform remains the dominant processor architecture (Intel is facing


increased competition from AMD, but even AMD’s processors implement the x86 platform, or a 64-bit superset known as x86-64). There are three main reasons for this. One, the very large base of proprietary PC applications are written for x86, whereas no RISC platform has a similar installed base, and this meant PC users were locked into the x86. The second is that, although RISC was indeed able to scale up in performance quite quickly and cheaply, Intel took advantage of its large market by spending vast amounts of money on processor development. Intel could spend many times as much as any RISC manufacturer on improving low level design and manufacturing. The same could not be said about smaller firms like Cyrix and NexGen, but they realized that they could apply pipelined design philosophies and practices to the x86-architecture – either directly as in the 6×86 and MII series, or indirectly (via extra decoding stages) as in Nx586 and AMD K5. Later, more powerful processors such as Intel P6 and AMD K6 had similar RISC-like units that executed a stream of micro-operations generated from decoding stages that split most x86 instructions into several pieces. Today, these principles have been further refined and are used by modern x86 processors such as Intel Core 2 and AMD K8. The first available chip deploying such techniques was the NexGen Nx586, released in 1994 (while the AMD K5 was severely delayed and released in 1995). As of 2007, the x86 designs (whether Intel’s or AMD’s) are as fast as (if not faster than) the fastest true RISC single-chip solutions available.

Addressing Modes of CISC :

The 68000 addressing (Motorola) modes

· Register to Register,

· Register to Memory,

· Memory to Register, and

· Memory to Memory

68000 Supports a wide variety of addressing modes.

· Immediate mode –- the operand immediately follows the instruction

· Absolute address – the address (in either the "short" 16-bit form or "long" 32-bit form) of the operand immediately follows the instruction

· Program Counter relative with displacement – A displacement value is added to the program counter to calculate the operand’s address. The displacement can be positive or negative.


· Program Counter relative with index and displacement – The instruction contains both the identity of an "index register" and a trailing displacement value. The contents of the index register, the displacement value, and the program counter are added together to get the final address.

· Register direct – The operand is contained in an address or data register.

· Address register indirect – An address register contains the address of the operand.

· Address register indirect with predecrement or postdecrement – An address register contains the address of the operand in memory. With the predecrement option set, a predetermined value is subtracted from the register before the (new) address is used. With the postincrement option set, a predetermined value is added to the register after the operation completes.

· Address register indirect with displacement — A displacement value is added to the register’s contents to calculate the operand’s address. The displacement can be positive or negative.

· Address register relative with index and displacement — The instruction contains both the identity of an "index register" and a trailing displacement value. The contents of the index register, the displacement value, and the specified address register are added together to get the final address.

Addressing Modes for RISC (Intel 80x86 Architecture)

· Simple Addressing Modes (3)

- Immediate Mode : operand is part of the instruction

- Register Addressing : operand is contained in register

- Direct : operand field of instruction contains effective address

· Register Indirect Mode – contents of register is effective address

- Only the base registers BX, BP and the index registers SI, DI can be used for register indirect addressing. However for reasons give below do not use the BP register.

- Register indirect can be used to implement arrays

- Push and Pop instructions are implemented using register indirection with the SP register.


- The DS segment register is used with the BX, SI, and DI registers. However since the SS segment register is used with the BP, using BP for register indirection will access the stack and not the data segment.

· Base + Offset Indirect or Index + Offset Indirect

The effective address is obtained by adding the offset value contained in the operand field of the instruction to the contents of a register

- Base + Offset Indirect (Index + Offset Indirect) makes use of the Base registers BX and BP (but avoid BP for reasons given above) or the Index registers SI and DI.

- Base + Offset Indirect provides an alternate method for inplementing arrays

- Array Implementation – Offset contains fixed value (usually address of zeroth byte in array) while the contents of the Base register is incremented to compute offset addresses within array. See above example.

- Record Implementation – Fields within records are accessed as fixed offsets from the Base address of the record.

- Syntax for Base + Offset Indirect Addressing. The following are equivalent

add ax, Table[bx] add ax, [Table+bx] add ax, Table+[bx] add ax, [bx]+Table

· Base + Index + Offset Indirect

The effective address is obtained by adding the contents of a Base register (BX or BP but avoid BP) to the contents of an Index register (SI or DI) plus an offset (operand field of instruction). That is

Eaddr <- C (Base Reg) + C(Index Reg) + Offset.

· Relative (branch instructions only) : IP <- IP + offset; same as relative addressing


Que: 2. Discuss the following:

a. Design Specification of Assembler

b. Design of Single Pass Assembler

Ans:

a.) Design Specification of Assembler

The purpose of a Software Design Specification (SDS) is to define the software that is to meet the functional requirements for the project. It is the stage at which the supplier specifies the detailed design of the software system, produces the program code to realize that design, tests the individual programs and integrates them into the complete software system.

Now, a PCS automation system generally has a collection of standard reusable modules that need to be configured and/or programmed. But unlike a typical IT type system the design of these modules is often part of the standard software of the system and is not needed in detail in the SDS. A good example of this is a PID Controller, where the PID algorithm is not something specifically designed for the project. It may have some documentation in the systems standard manuals but they are not normally considered to be part of the SDS. There is another class of module that a PCS often contains, these are application library objects rather than standard software. All software modules, including application specific and system standard should have version control

A programming language that is one step away from machine language. Each assembly language statement is translated into one machine instruction by the assembler. Programmers must be well versed in the computer's architecture, and, undocumented assembly language programs are difficult to maintain. It is hardware dependent; there is a different assembly language for each CPU series.

It Used to All Be Assembly Language

In the past, control programs (operating systems, database managers, etc.) and many applications were written in assembly language to maximize the machine's performance. Today, C/C++ is widely used instead. Like assembly language, C/C++ can manipulate the bits at the machine level, but it is also portable to different computer platforms. There are C/C++ compilers for almost all computers.

Assembly Language Vs. Machine Language


Although often used synonymously, assembly language and machine language are not the same. Assembly language is turned into machine language. For example, the assembly instruction COMPARE A,B is translated into COMPARE contents of memory bytes 2340-2350 with 4567-4577 (where A and B happen to be located). The physical binary format of the machine instruction is specific to the computer it's running in.

They Can Be Quite Different

Assembly languages are quite different between computers as is evident in the example below, which takes 16 lines of code for the mini and 82 lines for the micro. The example changes Fahrenheit to Celsius.

b.) Design of Single Pass Assembler

Assemblers a program that turns symbols into machine instructions. ISA-specific:close correspondence between symbols and instruction set mnemonics for opcodes, labels for memory locations, additional operations for allocating storage and initializing data

Each line of a program is one of the following:

•An instruction

•An assembler directive (or pseudo-op)

•A comment

Whitespace (between symbols) and case are ignored. Comments (beginning with “;”) are also ignored

Assembler Design can be done in:

–Single pass –Two pass

•Single Pass Assembler:

–Does everything in single pass

–Cannot resolve the forward referencing


The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers generally compile faster than multi-pass compilers. Many languages were designed so that they could be compiled in a single pass (e.g., Pascal).

In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.

The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyze another expression once.



a. Macro Parameters

b. Nested and Recursive Macro Calls and its expansion

c. Flow chart of Design of Macro Preprocessors Implementation

Ans:

a.) Macro Parameters

A macro is a unit of specifications for program generation through expansion. A shortcut method for invoking a sequence of user interface functions. Macros let users turn widely used sequences of menu selections and keystrokes into one command or key combination. For example, pressing the F2 key might cause several menu options to be selected and several dialog box OK buttons to be clicked in a prescribed sequence. A special-purpose command language within an application. In assembly language, a prewritten subroutine that is called for throughout the program. At assembly time, the macro calls are substituted with the actual subroutine or instructions that branch to it. The high-level language equivalent is a function.

Keyboard and mouse macros that are created using an application's built-in macro features are sometimes called application macros. They are created by carrying out the sequence once and letting the application record the actions. An underlying macro programming language, most commonly a Scripting language, with direct access to the features of the application may also exist.

The programmers' text editor Emacs follows this idea to a conclusion. In effect, most of the editor is made of macros. Emacs was originally devised as a set of macros in the editing language TECO; it was later ported to dialects of Lisp.

Another programmer's text editor Vim (a descendant of vi) also has full implementation of macros. It can record into a register (macro) what a person types on the keyboard and it can be replayed or edited just like VBA macros for Microsoft Office. Also it has a scripting language called Vimscript[4] to create macros.[5]

Visual Basic for Applications (VBA) is a programming language included in Microsoft Office and some other applications. However, its function has evolved from and replaced the macro languages that


were originally included in some of these applications. With this positional parameters, the programmer must be careful to specify the arguments in the proper order. If a macro has a large number of parameters, and only a few of these are given values in a typical invocation, a different form of parameter specification is more useful. This is called as keyword parameters.

b. Nested and Recursive Macro Calls and its expansion

Most macro processors allow parameters to be concatenated with other character string. For example, XA1,XA2,….,XB1,etc. where A,B are parameters. Any symbol that begins with the character ‘&’ and this is not a macro instruction parameter is assumed to be a macro-time variable. All such variables are initialized to a value of ‘0’. Macro invocation of one macro by another macro is known as macro within macro and also referred to as Recursive macro call. When a macro invocation statement is recognized, the arguments are stored in ARGTAB according to their position in the argument list. The positional notation parameter marked as ‘?n’ for representing the position of the parameter.

Macro processor replaces each macro instruction with the corresponding group of source statements. These macro instructions allow the programmer to write a shorthand version of a program, and leave the mechanical details to be handled by the macro processor.

Macro invocation of one macro by another macro is known as macro within macro and also referred to as Recursive macro call. Most macro processors can also modify the sequence of statements generated for a macro expansion, depending on the arguments supplied in the macro invocation. Conditional Assembly is commonly used to describe this feature. It is also referred to as conditional macro expansion.

Formally, a frame is a procedural macro consisting of frame-text - zero or more lines of ordinary (program) text and frame commands (that are carried out by FT’s frame processor as it manufactures custom programs). Each frame is both a generic component in a hierarchy of nested subassemblies, and a procedure for integrating itself with its subassembly frames (a recursive process that resolves integration conflicts in favor of higher level subassemblies). Macros also make it possible to define data languages that are immediately compiled into code, which means that constructs such as state machines can be implemented in a way that is both natural and efficient.

c. Flow chart of Design of Macro Preprocessors Implementation

In the implementation of assert macros, to prevent a performance hit at run-time. This is an excellent practice, except that it means that any otherwise useful code performed by the assert statement becomes an undesirable side-effect. Macros. In-line functions work just as well. Functions don't bomb the program if they are not wrapped in enough parentheses. Functions can be stepped into and


debugged; most debuggers cannot step into macros. Most importantly, function calls always evaluate their parameters only once. (Try using 'x++' in a 'min(,)' macro. Surprise!) To comment out blocks of code from the compiler.

Preprocessor Code

#if TARGET_WINDOWS /* assume TARGET_WINDOWS is 1 for this example */

typedef enum {

WINDOWS_FLAG_A = 64,

WINDOWS_FLAG_B = 128

} WinFlags;

#endif

void routine(WinFlags flag)

{

switch(flag) {

case WINDOWS_FLAG_A: /* this compiles fine - WINDOWS_FLAG_A is defined */

doStuffA();

break;

case WINDOWS_FLAG_B:

doStuffB();

break;

}

#ifdef WINDOWS_FLAG_A

doStuffWin(); /* this isn't included - WINDOWS_FLAG_A is NOT defined */

#endif

return;

}



a. Phases of Compilation

b. Java Compiler and Environment

Ans:

a.) Phases of Compilation

A compiler takes as input a source program and produces as output an equivalent sequence of machine instructions. This process is so complex that it is not reasonable, either from a logical point of view or from an implementation point of view, to consider the compilation process as occurring in one single step. For this reason, it is customary to partition the compilation process into a series of sub processes called phases, as shown in the Fig 1.2. A phase is a logically cohesive operation that takes as input one representation of the source program and produces as output another representation.

The first phase, called the lexical analyzer, or scanner, separates characters of the source language into groups that logically belong together; these groups are called tokens. The usual tokens are keywords, such as DO or IF identifiers, such as X or NUM, operator symbols such as < = or +, and punctuation symbols such as parentheses or commas. The output of the lexical analyzer is a stream of tokens, which is passes to the next phase, the syntax analyzer, or parser. The tokens in this stream can be represented by codes which we may regard as integers. Thus, DO might be represented by 1, + by 2, and “identifier” by 3. In the case of a token like ‘identifier”, a second quantity, telling which of those identifiers used by the program is represented by this instance of token “identifier”, is passed along with the integer code for “identifier”.

The syntax analyzer groups tokens together into syntactic structures. For example, the three tokens representing A + B might be grouped into a syntactic structure called an expression. Expressions might further be combined to form statements. Often the syntactic structure can be regarded as a tree whose leaves are the tokens. The interior nodes of the tree represent strings of tokens that logically belong together.

The intermediate code generator uses the structure produced by the syntax analyzer to create a stream of simple instructions. Many styles of intermediate code are possible. One common style uses instructions with one operator and a small number of operands. These instructions can be viewed as simple macros like the macro ADD2. The primary difference between intermediate code and assembly code is that the intermediate code need not specify the registers to be used for each operation.


Code Optimization is an optional phase designed to improve the intermediate code so that the ultimate object program runs faster and / or takes less space. Its output is another intermediate code program that does the same job as the original, but perhaps in a way that saves time and / or space.

The final phase, code generation, produces the object code by deciding on the memory locations for data, selecting code to access each datum, and selecting the registers in which each computation is to be done. Designing a code generator that produces truly efficient object programs is one of the most difficult parts of compiler design, both practically and theoretically.

The Table-Management, or bookkeeping, portion of the compiler keeps track of the names used by the program and records essential information about each, such as its type (integer, real, etc). The data structure used to record this information is called a Symbol table.

The Error Handler is invoked when a flaw in the source program is detected. It must warn the programmer by issuing a diagnostic, and adjust the information being passed from phase to phase so that each phase can proceed. It is desirable that compilation be completed on flawed programs, at least through the syntax-analysis phase, so that as many errors as possible can be detected in one compilation. Both the table management and error handling routines interact with all phases of the compiler.

b.) Java Compiler and Environment

Anything which converts 'JavaLanguage' to _any_ other form is a Java compiler, as the term is commonly understood. Only something which converts 'JavaLanguage' to 'JVM language' is a Java compiler, as defined by Sun. It is sad that the legal system forces such a distinction of technical terms to be important for political reasons

Java "compilers" that convert Java source into something executed in the host machine's native environment (Windows, Linux, VMS, OS/9, VxWorks, etc.) are not, by definition, "Java compilers." They are Java converters. This distinction is not simply a matter of semantics; it goes to the very heart of the Java Machine and its usefulness.

There are also virtual CPUs, such as the JavaVirtualMachine. These could have been implemented in hardware, but happen to be implemented in software. In the future we may see JavaVirtualMachine's implemented in hardware then we will have to distinguish between those that are JavaVirtualMachine's and those that are true JavaMachine.


Master of Computer Application (MCA) – Semester 3

Systems ProgrammingAssignment Set – 2

Que: 1. Explain the design of a multi-pass assembler.

Ans:


Pass 1

· Assign addresses to all statements in the program

· Save the values assigned to all labels for use in Pass 2

· Perform some processing of assembler directives

Pass 2

· Assemble instructions

· Generate data values defined by BYTE, WORD

· Perform processing of assembler directives not done in Pass 1

· Write the object program and the assembly listing


Que: 2. Explain the following:

a. Basic Assembler Functions

b. Design of Multi-pass(two pass) Assemblers Implementation

c. Examples: MASM Assembler and SPARC Assembler.

Ans: a. Basic Assembler Functions

Often the assembler cannot generate debug information automatically. This means that you cannot get a source report unless you manually define the necessary debug information; read your assembler documentation for how you might do that. The only debugging info needed currently by O Profile is the line-number/filename-VMA association. When profiling assembly without debugging info you can always get report for symbols, and optionally for VMA, through opreport -l or opreport -d, but this works only for symbols with the right attributes.

Basic assembler directives

START, END, BYTE, WORD, RESB, RESW

Purpose: reads records from input device (code F1). copies them to output device (code 05) at the end of the file, writes EOF on the output device, then RSUB to the operating system

Data transfer (RD, WD) a buffer is used to store record buffering is necessary for different I/O rates the end of each record is marked with a null character (0016) the end of the file is indicated by a zero-length record Subroutines (JSUB, RSUB) RDREC, WRREC

save link register first before nested jump.

Assembler’s functions

Convert mnemonic operation codes to their machine language equivalents Convert symbolic operands to their equivalent machine addresses. Build the machine instructions in the proper format. Convert the data constants to internal machine representations. Write the object program and the assembly listing.


b. Design of Multi-pass(two pass) Assemblers Implementation


Pass 1

· Assign addresses to all statements in the program

· Save the values assigned to all labels for use in Pass 2

· Perform some processing of assembler directives

Pass 2

· Assemble instructions

· Generate data values defined by BYTE, WORD

· Perform processing of assembler directives not done in Pass 1

· Write the object program and the assembly listing

c. Examples: MASM Assembler and SPARC Assembler.

You can assemble this by typing: "tasm first [enter] tlink first [enter]" or something like: "masm first [enter] link first [enter] You must have an assembler and the link/tlink program.

.model small

.stack

.data

message db "Hello world, I'm learning Assembly !!!", "$"

.code

main proc

mov ax,seg message


mov ds,ax

mov ah,09

lea dx,message

int 21h

mov ax,4c00h

int 21h

main endp

end main

.model small: Lines that start with a "." are used to provide the assembler with infomation. The word(s) behind it say what kind of info. In this case it just tells the assembler the program is small and doesn't need a lot of memory. I'll get back on this later..stack: Another line with info. This one tells the assembler that the "stack" segment starts here. The stack is used to store temporary data. It isn't used in the program, but it must be there, because we make an .EXE file and these files MUST have a stack..data: indicates that the data segment starts here and that the stack segment ends there..code : indicates that the code segment starts there and the data segment ends there.

There are very few addressing modes on the SPARC, and they may be used only in certain very restricted combinations. The three main types of SPARC instructions are given below, along with the valid combinations of addressing modes. There are only a few unusual instructions which do not fall into these catagories.

1. Arithmetic/Logical/Shift instructions

opcode reg1,reg2,reg3 !reg1 op reg2 -> reg3

2. Load/Store Instructions

opcode [reg1+reg2],reg3


The SPARC code for this subroutine can be written several ways; two possible approaches are given below. (The 'X's in the center line indicate the differences between the two approaches.)

.global prt_sum | .global prt_sum

prt_sum: | prt_sum:

save %sp,-96,%sp | save %sp,-96,%sp

|

clr %l0 | clr %l0

clr %l1 | clr %l1

mov %i0,%l2 X

loop: | loop:

cmp %l0,%i1 | cmp %l0,%i1

bge done | bge done

nop | nop

X sll %l0,2,%l2

ld [%l2],%o0 X ld [%i0+%l2],%o0

add %l1,%o0,%l1 | add %l1,%o0,%l1

add %l2,4,%l2 X

inc %l0 | inc %l0

ba loop | ba loop

nop | nop

done: | done:


Que 3. What is Relocation? Write the relocation algorithm in detail.

Ans: Relocation

One wrinkle that the loader must handle is that the actual location in memory of the library data cannot be known until after the executable and all dynamically linked libraries have been loaded into memory. This is because the memory locations used depend on which specific dynamic libraries have been loaded. It is not possible to depend on the absolute location of the data in the executable, nor even in the library, since conflicts between different libraries would result: if two of them specified the same or overlapping addresses, it would be impossible to use both in the same program.

However, in practice, the shared libraries on most systems do not change often. Therefore, it is possible to compute a likely load address for every shared library on the system before it is needed, and store that information in the libraries and executables. If every shared library that is loaded has undergone this process, then each will load at their predetermined addresses, which speeds up the process of dynamic linking. This optimization is known as prebinding in Mac OS X and prelinking in Linux. Disadvantages of this technique include the time required to precompute these addresses every time the shared libraries change, the inability to use address space layout randomization, and the requirement of sufficient virtual address space for use (a problem that will be alleviated by the adoption of 64-bit architectures, at least for the time being).

An old method was to examine the program at load time and replace all references to data in the libraries with pointers to the appropriate memory locations once all libraries have been loaded. On Windows 3.1 (and some embedded systems such as Texas Instruments calculators), the references to patch were arranged as linked lists, allowing easy enumeration and replacement. Nowadays, most dynamic library systems link a symbol table with blank addresses into the program at compile time. All references to code or data in the library pass through this table, the import directory. At load time the table is modified with the location of the library code/data by the loader/linker. This process is still slow enough to significantly affect the speed of programs that call other programs at a very high rate, such as certain shell scripts.

The library itself contains a jump table of all the methods within it, known as entry points. Calls into the library "jump through" this table, looking up the location of the code in memory, then calling it. This introduces overhead in calling into the library, but the delay is usually so small as to be negligible


Relocation Algorithm

1. program_linked_origin :=<link origin> from linker command ;

2. For each object module

A. t_origin := translated origin of the object module;

OM_size := size of the object module;

B. relocation_factor := program_linked_origin--- t_origin;

C. Read the machine language program in work_area.

D. Read the RELOCATB of the object module.

E. For each entry in RELOCTAB

i) Translated_addr :=address in the RELOCTAB entry;

ii) Address_in_work_area := address of work_area+translated_address—t_origin;

iii) Add relocation_factor to the operand address in the word with the address address_in_work_area.

F. Program_linked_origin := program_linked_origin+ OM_size;

The computations performed in the algorithm are along the lines described… the only new action is the computation of the work area address of the word requiring relocation(step2(e)………._). Step2(f) increments program_linked_orign so that the next object module would granted the next available load address.


Que 4. Explain the following:

a. YACC Compiler-Compiler

b. Interpreters

c. Compiler writing tools

Ans:

a.) YACC Compiler-Compiler

If you have been programming for any length of time in a Unix environment, you will have encountered the mystical programs Lex & YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison, where Flex is a Lex implementation by Vern Paxson and Bison the GNU version of YACC. We will call these programs Lex and YACC throughout - the newer versions are upwardly compatible, so you can use Flex and Bison when trying our examples.

These programs are massively useful, but as with your C compiler, their man page does not explain the language they understand, nor how to use them. YACC is really amazing when used in combination with Lex, however, the Bison man page does not describe how to integrate Lex generated code with your Bison program. YACC can parse input streams consisting of tokens with certain values. This clearly describes the relation YACC has with Lex, YACC has no idea what 'input streams' are, it needs preprocessed tokens. While you can write your own Tokenizer, we will leave that entirely up to Lex.

A note on grammars and parsers. When YACC saw the light of day, the tool was used to parse input files for compilers: programs. Programs written in a programming language for computers are typically *not* ambiguous - they have just one meaning. As such, YACC does not cope with ambiguity and will complain about shift/reduce or reduce/reduce conflicts.

Example:

%{

#include <stdio.h>

#include <string.h>


void yyerror(const char *str)

{

fprintf(stderr,"error: %s\n",str);

}

int yywrap()

{

return 1;

}

main()

{

yyparse();

}

%}

%token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE


b. Interpreters

A program that executes instructions written in a high-level language. There are two ways to run programs written in a high-level language. The most common is to compile the program; the other method is to pass the program through an interpreter.

An interpreter translates high-level instructions into an intermediate form, which it then executes. In contrast, a compiler translates high-level instructions directly into machine language.

Compiled programs generally run faster than interpreted programs. The advantage of an interpreter, however, is that it does not need to go through the compilation stage during which machine instructions are generated. This process can be time-consuming if the program is long. The interpreter, on the other hand, can immediately execute high-level programs. For this reason, interpreters are sometimes used during the development of a program, when a programmer wants to add small sections at a time and test them quickly. In addition, interpreters are often used in education because they allow students to program interactively.

Both interpreters and compilers are available for most high-level languages. However, BASIC and LISP are especially designed to be executed by an interpreter. In addition, page description languages, such as PostScript, use an interpreter. Every PostScript printer, for example, has a built-in interpreter that executes PostScript instructions.

c. Compiler writing tools

A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following


operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization.

Purdue Compiler-Construction Tool Set tool:

(PCCTS) A highly integrated lexical analyses generator and parser generator by Terence J. Parr , Will E. Cohen and Henry G. Dietz , both of Purdue University. ANTLR (Another Tool for Language Recognition) corresponds to YACC and DLG (DFA-based Lexical analyzer Generator) functions like LEX. PCCTS has many additional features which make it easier to use for a wide range of translation problems. PCCTS grammars contain specifications for lexical and syntactic analysis with selective backtracking ("infinite look ahead"), semantic predicates, intermediate-form construction and error reporting. Rules may employ Extended BNF (EBNF) grammar constructs and may define parameters, return values, and have local variables.

Languages described in PCCTS are recognized via LLk parsers constructed in pure, human-readable, C code. Selective backtracking is available to handle non-LL(k) constructs. PCCTS parsers may be compiled with a C++ compiler. PCCTS also includes the SORCERER tree parser generator. Latest version: 1.10, runs under Unix, MS-DOS, OS/2, and Macintosh and is very portable.


Documents

Systems Programming Assignment