Click here to load reader
Upload
neeraj-singh
View
655
Download
10
Embed Size (px)
DESCRIPTION
February 2009 Question PaperComplete Solution & Supplementary Q&A11/31/2012Question 1 (10 Marks)y Write notes on any 2 Program Flow mechanisms21/31/2012Program Flow Mechanismsy Conventional machines used control flow mechanism in which orderof program execution explicitly stated in user programs. y Dataflow machines which instructions can be executed by determining operand availability. y Reduction machines trigger an instruction·s execution based on the demand for its results
Citation preview
February 2009 Question PaperComplete Solution & Supplementary Q&A
1
1/31/2012
Question 1 (10 Marks)y Write notes on any 2 Program Flow mechanisms
2
1/31/2012
Program Flow Mechanismsy Conventional machines used control flow mechanism in which order
of program execution explicitly stated in user programs. y Dataflow machines which instructions can be executed by determining operand availability. y Reduction machines trigger an instructions execution based on the demand for its results.
3
1/31/2012
Comparison of program flow mechanismsMachine Model Control Flow Data flow Reduction machine
Basic Definition
Conventional computation; token Eager evaluation; statements are Lazy evaluation; statements are of control indicates when a executed when all their operands are executed only when their result is statement should be executed. available required for another computation
Advantages
1. 2.
Full control 1. Complex data and control structures are easily 2. implemented. 3.
Very high potential parallelism High throughput Free from side effects
for 1. 2. 3.
Only required instructions are executed. High degree of parallelism Easy manipulation of data structures.
Disadvantages
1. 2. 3.
Less efficient 1. Difficult in programming 2. Difficult in preventing run time error
High control overhead 1. Difficult in manipulating data structures
Time needed to demand tokens
propagate
4
1/31/2012
Control Flow vs. Data Flowy Control flow machines used shared memory for instructions and data.
Since variables are updated by many instructions, there may be side effects on other instructions. These side effects frequently prevent parallel processing. Single processor systems are inherently sequential.y Instructions in dataflow machines are unordered and can be executed as
soon as their operands are available; data is held in the instructions themselves. Data tokens are passed from an instruction to its dependents to trigger execution.
5
1/31/2012
Data Flow Featuresy No need fory shared memory y program counter y control sequencer
y Special mechanisms are required toy detect data availability y match data tokens with instructions needing them y enable chain reaction of asynchronous instruction execution
6
1/31/2012
7
1/31/2012
A Dataflow Architecture - 1y The Arvind machine (MIT) has N PEs and an N-by-N interconnection
network. y Each PE has a token-matching mechanism that dispatches only instructions with data tokens available. y Each datum is tagged withy address of instruction to which it belongs y context in which the instruction is being executed
y Tagged tokens enter PE through local path (pipelined), and can also be
communicated to other PEs through the routing network.
8
1/31/2012
A Dataflow Architecture - 2y Instruction address(es) effectively replace the program
counter in a control flow machine. y Context identifier effectively replaces the frame base register in a control flow machine. y Since the dataflow machine matches the data tags from one instruction with successors, synchronized instruction execution is implicit.
9
1/31/2012
A Dataflow Architecture - 3y An I-structure in each PE is provided to eliminate excessive y
y y y
copying of data structures. Each word of the I-structure has a two-bit tag indicating whether the value is empty, full, or has pending read requests. This is a retreat from the pure dataflow approach. Example 2.6 shows a control flow and dataflow comparison. Special compiler technology needed for dataflow machines.
10
1/31/2012
Demand-Driven Mechanismsy Data-driven machines select instructions for execution based on the
availability of their operands; this is essentially a bottom-up approach. y Demand-driven machines take a top-down approach, attempting to execute the instruction (a demander) that yields the final result. This triggers the execution of instructions that yield its operands, and so forth. y The demand-driven approach matches naturally with functional programming languages (e.g. LISP and SCHEME).
11
1/31/2012
12
1/31/2012
Reduction Machine Modelsy String-reduction model: y each demander gets a separate copy of the expression string to evaluate y each reduction step has an operator and embedded reference to demand the corresponding operands y each operator is suspended while arguments are evaluated y Graph-reduction model: y expression graph reduced by evaluation of branches or subgraphs, possibly in parallel, with demanders given pointers to results of reductions. y based on sharing of pointers to arguments; traversal and reversal of pointers continues until constant arguments are encountered.
13
1/31/2012
Summaryy Control flow machines give complete control, but are less efficient than
other approaches. y Data flow (eager evaluation) machines have high potential for parallelism and throughput and freedom from side effects, but have high control overhead, lose time waiting for unneeded arguments, and difficulty in manipulating data structures. y Reduction (lazy evaluation) machines have high parallelism potential, easy manipulation of data structures, and only execute required instructions. But they do not share objects with changing local state, and do require time to propagate tokens.
14
1/31/2012
Question 2 (4+4+2)y Question 2 -Write notes on the following:y Amdahls law and efficiency of a system y Utilization of system and quality of parallelism y Redundancy
15
1/31/2012
Amdahls Lawy Assume Ri = i, and w (the weights) are (E, 0, , 0, 1-E). y Basically this means the system is used sequentially (with probability E) or
all n processors are used (with probability 1- E). y This yields the speedup equation known as Amdahls law:
n Sn ! 1 n 1EThe implication is that the best speedup possible is 1/ E, regardless of n, the number of processors.16 1/31/2012
System Efficiency 1y Assume the following definitions: y O (n) = total number of unit operations performed by an n-processor system in completing a program P. y T (n) = execution time required to execute the program P on an n-processor system. y O (n) can be considered similar to the total number of instructions
executed by the n processors, perhaps scaled by a constant factor. y If we define O (1) = T (1), then it is logical to expect that T (n) < O (n) when n > 1 if the program P is able to make any use at all of the extra processor(s).
17
1/31/2012
System Efficiency 2y Clearly, the speedup factor (how much faster the program runs with
n processors) can now be expressed as S (n) = T (1) / T (n) Recall that we expect T (n) < T (1), so S (n) u 1. y System efficiency is defined as E (n) = S (n) / n = T (1) / ( n v T (n) ) It indicates the actual degree of speedup achieved in a system as compared with the maximum possible speedup. Thus 1 / n e E (n) e 1. The value is 1/n when only one processor is used (regardless of n), and the value is 1 when all processors are fully utilized.
18
1/31/2012
Redundancyy The redundancy in a parallel computation is defined as
R (n) = O (n) / O (1) y What values can R (n) obtain?y R (n) = 1 when O (n) = O (1), or when the number of operations performed
is independent of the number of processors, n. This is the ideal case. y R (n) = n when all processors performs the same number of operations as when only a single processor is used; this implies that n completely redundant computations are performed!
y The R (n) figure indicates to what extent the software parallelism
is carried over to the hardware implementation without having extra operations performed.
19
1/31/2012
System Utilizationy System utilization is defined as
U (n) = R (n) v E (n) = O (n) / ( n v T (n) ) It indicates the degree to which the system resources were kept busy during execution of the program. Since 1 e R (n) e n, and 1 / n e E (n) e 1, the best possible value for U (n) is 1, and the worst is 1 / n. y 1 / n e E (n) e U (n) e 1 y 1 e R (n) e 1 / E (n) e n
20
1/31/2012
Quality of Parallelismy The quality of a parallel computation is defined as
Q (n) = S (n) v E (n) / R (n) = T 3 (1) / ( n v T 2 (n) v O (n) ) y This measure is directly related to speedup (S) and efficiency (E), and inversely related to redundancy (R). y The quality measure is bounded by the speedup (that is, Q (n) e S (n) ).
21
1/31/2012
Question 3y Explain super scalar processors.
22
1/31/2012
Superscalar Processorsy This subclass of the RISC processors allow multiple
instructoins to be issued simultaneously during each cycle. y The effective CPI of a superscalar processor should be less than that of a generic scalar RISC processor. y Clock rates of scalar RISC and superscalar RISC machines are similar.
23
1/31/2012
24
1/31/2012
25
1/31/2012
Question 4 (10 marks)y Explain the cache addressing model
26
1/31/2012
Cache Addressing Modelsy Most systems use private caches for each processor y Have an interconnection n/w b/t caches and main memory y Address caches using either a physical address or virtual
address
27
1/31/2012
Physical Address Cachesy Cache is indexed and tagged with the physical address y Cache lookup occurs after address translation in TLB or
MMU (no aliasing) y After cache miss, load a block from main memory y Use either write-back or write-through policy
28
1/31/2012
Physical Address Cachesy Advantages:y No cache flushing y No aliasing problems y Simplistic design y Requires little intervention
y Disadvantage:y Slowdown in accessing
cache until the MMU/TLB finishes translation
from OS kernel
29
1/31/2012
Physical Address Models
30
1/31/2012
Virtual Address Cachesy Cache indexed or tagged w/virtual address y Cache and MMU translation/validation performed in parallel y Physical address saved in tags for write back y More efficient access to cache
31
1/31/2012
Virtual Address Model
32
1/31/2012
Aliasing Problemy Different logically addressed data have the same index/tag in
the cache y Confusion if two or more processors access the same physical cache location y Flush cache when aliasing occurs, but leads to slowdown y Apply special tagging with a process key or with a physical address
33
1/31/2012
Block Placement Schemesy Performance depends upon cache access patterns,
organization, and management policy y Blocks in caches are block frames, and blocks in main memory y Bi (i e m), Bj (i e n), n