ACA Answer Key Feb 2009

February 2009 Question PaperComplete Solution & Supplementary Q&A

1

1/31/2012

Question 1 (10 Marks)y Write notes on any 2 Program Flow mechanisms

2

1/31/2012

Program Flow Mechanismsy Conventional machines used control flow mechanism in which order

of program execution explicitly stated in user programs. y Dataflow machines which instructions can be executed by determining operand availability. y Reduction machines trigger an instructions execution based on the demand for its results.

3

1/31/2012

Comparison of program flow mechanismsMachine Model Control Flow Data flow Reduction machine

Basic Definition

Conventional computation; token Eager evaluation; statements are Lazy evaluation; statements are of control indicates when a executed when all their operands are executed only when their result is statement should be executed. available required for another computation

Advantages

1. 2.

Full control 1. Complex data and control structures are easily 2. implemented. 3.

Very high potential parallelism High throughput Free from side effects

for 1. 2. 3.

Only required instructions are executed. High degree of parallelism Easy manipulation of data structures.

Disadvantages

1. 2. 3.

Less efficient 1. Difficult in programming 2. Difficult in preventing run time error

High control overhead 1. Difficult in manipulating data structures

Time needed to demand tokens

propagate

4

1/31/2012

Control Flow vs. Data Flowy Control flow machines used shared memory for instructions and data.

Since variables are updated by many instructions, there may be side effects on other instructions. These side effects frequently prevent parallel processing. Single processor systems are inherently sequential.y Instructions in dataflow machines are unordered and can be executed as

soon as their operands are available; data is held in the instructions themselves. Data tokens are passed from an instruction to its dependents to trigger execution.

5

1/31/2012

Data Flow Featuresy No need fory shared memory y program counter y control sequencer

y Special mechanisms are required toy detect data availability y match data tokens with instructions needing them y enable chain reaction of asynchronous instruction execution

6

1/31/2012

7

1/31/2012

A Dataflow Architecture - 1y The Arvind machine (MIT) has N PEs and an N-by-N interconnection

network. y Each PE has a token-matching mechanism that dispatches only instructions with data tokens available. y Each datum is tagged withy address of instruction to which it belongs y context in which the instruction is being executed

y Tagged tokens enter PE through local path (pipelined), and can also be

communicated to other PEs through the routing network.

8

1/31/2012

A Dataflow Architecture - 2y Instruction address(es) effectively replace the program

counter in a control flow machine. y Context identifier effectively replaces the frame base register in a control flow machine. y Since the dataflow machine matches the data tags from one instruction with successors, synchronized instruction execution is implicit.

9

1/31/2012

A Dataflow Architecture - 3y An I-structure in each PE is provided to eliminate excessive y

y y y

copying of data structures. Each word of the I-structure has a two-bit tag indicating whether the value is empty, full, or has pending read requests. This is a retreat from the pure dataflow approach. Example 2.6 shows a control flow and dataflow comparison. Special compiler technology needed for dataflow machines.

10

1/31/2012

Demand-Driven Mechanismsy Data-driven machines select instructions for execution based on the

availability of their operands; this is essentially a bottom-up approach. y Demand-driven machines take a top-down approach, attempting to execute the instruction (a demander) that yields the final result. This triggers the execution of instructions that yield its operands, and so forth. y The demand-driven approach matches naturally with functional programming languages (e.g. LISP and SCHEME).

11

1/31/2012

12

1/31/2012

Reduction Machine Modelsy String-reduction model: y each demander gets a separate copy of the expression string to evaluate y each reduction step has an operator and embedded reference to demand the corresponding operands y each operator is suspended while arguments are evaluated y Graph-reduction model: y expression graph reduced by evaluation of branches or subgraphs, possibly in parallel, with demanders given pointers to results of reductions. y based on sharing of pointers to arguments; traversal and reversal of pointers continues until constant arguments are encountered.

13

1/31/2012

Summaryy Control flow machines give complete control, but are less efficient than

other approaches. y Data flow (eager evaluation) machines have high potential for parallelism and throughput and freedom from side effects, but have high control overhead, lose time waiting for unneeded arguments, and difficulty in manipulating data structures. y Reduction (lazy evaluation) machines have high parallelism potential, easy manipulation of data structures, and only execute required instructions. But they do not share objects with changing local state, and do require time to propagate tokens.

14

1/31/2012

Question 2 (4+4+2)y Question 2 -Write notes on the following:y Amdahls law and efficiency of a system y Utilization of system and quality of parallelism y Redundancy

15

1/31/2012

Amdahls Lawy Assume Ri = i, and w (the weights) are (E, 0, , 0, 1-E). y Basically this means the system is used sequentially (with probability E) or

all n processors are used (with probability 1- E). y This yields the speedup equation known as Amdahls law:

n Sn ! 1 n 1EThe implication is that the best speedup possible is 1/ E, regardless of n, the number of processors.16 1/31/2012

System Efficiency 1y Assume the following definitions: y O (n) = total number of unit operations performed by an n-processor system in completing a program P. y T (n) = execution time required to execute the program P on an n-processor system. y O (n) can be considered similar to the total number of instructions

executed by the n processors, perhaps scaled by a constant factor. y If we define O (1) = T (1), then it is logical to expect that T (n) < O (n) when n > 1 if the program P is able to make any use at all of the extra processor(s).

17

1/31/2012

System Efficiency 2y Clearly, the speedup factor (how much faster the program runs with

n processors) can now be expressed as S (n) = T (1) / T (n) Recall that we expect T (n) < T (1), so S (n) u 1. y System efficiency is defined as E (n) = S (n) / n = T (1) / ( n v T (n) ) It indicates the actual degree of speedup achieved in a system as compared with the maximum possible speedup. Thus 1 / n e E (n) e 1. The value is 1/n when only one processor is used (regardless of n), and the value is 1 when all processors are fully utilized.

18

1/31/2012

Redundancyy The redundancy in a parallel computation is defined as

R (n) = O (n) / O (1) y What values can R (n) obtain?y R (n) = 1 when O (n) = O (1), or when the number of operations performed

is independent of the number of processors, n. This is the ideal case. y R (n) = n when all processors performs the same number of operations as when only a single processor is used; this implies that n completely redundant computations are performed!

y The R (n) figure indicates to what extent the software parallelism

is carried over to the hardware implementation without having extra operations performed.

19

1/31/2012

System Utilizationy System utilization is defined as

U (n) = R (n) v E (n) = O (n) / ( n v T (n) ) It indicates the degree to which the system resources were kept busy during execution of the program. Since 1 e R (n) e n, and 1 / n e E (n) e 1, the best possible value for U (n) is 1, and the worst is 1 / n. y 1 / n e E (n) e U (n) e 1 y 1 e R (n) e 1 / E (n) e n

20

1/31/2012

Quality of Parallelismy The quality of a parallel computation is defined as

Q (n) = S (n) v E (n) / R (n) = T 3 (1) / ( n v T 2 (n) v O (n) ) y This measure is directly related to speedup (S) and efficiency (E), and inversely related to redundancy (R). y The quality measure is bounded by the speedup (that is, Q (n) e S (n) ).

21

1/31/2012

Question 3y Explain super scalar processors.

22

1/31/2012

Superscalar Processorsy This subclass of the RISC processors allow multiple

instructoins to be issued simultaneously during each cycle. y The effective CPI of a superscalar processor should be less than that of a generic scalar RISC processor. y Clock rates of scalar RISC and superscalar RISC machines are similar.

23

1/31/2012

24

1/31/2012

25

1/31/2012

Question 4 (10 marks)y Explain the cache addressing model

26

1/31/2012

Cache Addressing Modelsy Most systems use private caches for each processor y Have an interconnection n/w b/t caches and main memory y Address caches using either a physical address or virtual

address

27

1/31/2012

Physical Address Cachesy Cache is indexed and tagged with the physical address y Cache lookup occurs after address translation in TLB or

MMU (no aliasing) y After cache miss, load a block from main memory y Use either write-back or write-through policy

28

1/31/2012

Physical Address Cachesy Advantages:y No cache flushing y No aliasing problems y Simplistic design y Requires little intervention

y Disadvantage:y Slowdown in accessing

cache until the MMU/TLB finishes translation

from OS kernel

29

1/31/2012

Physical Address Models

30

1/31/2012

Virtual Address Cachesy Cache indexed or tagged w/virtual address y Cache and MMU translation/validation performed in parallel y Physical address saved in tags for write back y More efficient access to cache

31

1/31/2012

Virtual Address Model

32

1/31/2012

Aliasing Problemy Different logically addressed data have the same index/tag in

the cache y Confusion if two or more processors access the same physical cache location y Flush cache when aliasing occurs, but leads to slowdown y Apply special tagging with a process key or with a physical address

33

1/31/2012

Block Placement Schemesy Performance depends upon cache access patterns,

organization, and management policy y Blocks in caches are block frames, and blocks in main memory y Bi (i e m), Bj (i e n), n

Documents

ACA Answer Key Feb 2009