Upload
jordy-anson
View
219
Download
2
Embed Size (px)
Citation preview
CSE 8383 - Advanced Computer Architecture
Week-4Week of Feb 2, 2004
engr.smu.edu/~rewini/8383
Contents Reservation Table Latency Analysis State Diagrams MAL and its bounds Delay Insertion Throughput Group Work Introduction to Multiprocessors
Reservation Table A reservation table displays the time-
space flow of data through the pipeline for one function evaluation
A static pipeline is specified by a single reservation table
A dynamic pipeline may be specified by multiple reservation tables
Static Pipeline
X
X
X
X
S1
S2
S3
S4
Time
Dynamic Pipeline
X X X
X X
X X X
Y Y
Y
Y Y Y
S1
S2
S3
S1
S2
S3
Reservation Table (Cont.) The number of columns in a reservation
table is called the evaluation time of a given function.
The checkmarks in a row correspond to the time instants (cycles) that a particular stage will be used.
Multiple checkmarks in a row repeated usage of the same stage in different cycles
Reservation Table (Cont.) Contiguous checkmarks
extended usage of a stage over more than one cycle
Multiple checkmarks in one column multiple stages are used in parallel
A dynamic pipeline may allow different initiations to follow a mix of reservation table
Reservation Table
1 2 3 4 5 6 7
A X X X
B X X
C X X
D X
Latency Analysis The number of cycles between two
initiations is the latency between them
A latency of k two initiations are separated by k cycles
Collision resource conflict between two initiations
Latencies that cause collision forbidden latencies
Collision with latency 2 & 5 in evaluating X
X1 X2 X1 X2 X1
X1 X2 X1 X2
X1 X2 X1
X2 X1
S1
S2
S3
X1 X2 X1 X1
X1 X1 X2
X1 X1 X1 X2
S1
S2
S3
5
2
Latency Analysis (cont.) Latency Sequence a sequence of
permissible latencies between successive initiations
Latency Cycle a latency sequence that repeats the same subsequence (cycle) indefinitely
Latency Sequence 1, 8 Latencies Cycle (1,8) 1, 8, 1, 8, 1,
8 …
Latency Analysis (cont.) Average Latency (of a latency
cycle) sum of all latencies / number of latencies along the cycle
Constant Cycle One latency value
Objective Obtain the shortest average latency between initiations without causing collisions.
Latency Cycle (1,8)
1 2 3 4 5 6 7 8 9 10
11 12 13
14 15 16
17 18 19
20
21
X1
X2
X1
X2
X1
X2
X3
X4
X3
X4
X3
X4
X5
X6
X1
X2
X1
X2
X3
X4
X3
X4
X5
X6
X1
X2
X1
X2
X1
X2
X3
X4
X3
X4
X3
X4
X5
Average Latency = (1+8)/2 = 4.5
Latency Cycle (6)
1 2 3 4 5 6 7 8 9 10
11 12 13
14 15 16
17 18 19
20
21
X1
X1
X2
X1
X2
X3
X2
X 3
X4
X3
X1
X1
X2
X2
X3
X3
X4
X1
X1
X1
X2
X2
X2
X3
X3
X3
X4
Average Latency = 6
Collision VectorC = (Cm, Cm-1, …, C2, C1)
Ci = 1 if latency i causes collision (forbidden)
Ci = 0 if latency i is permissible
Cm = 1 (always) maximum forbidden latency
Maximum forbidden latency: m <= n-1n = number of column in reservation table
Collision Vector (X after X) Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1 0 1 1 0 1 0
Collision Vector (Y after Y) Forbidden Latencies: 2, 4 Collision Vector = 1 0 1 0
State Diagram It specifies the permissible state
transitions among successive initiations
Collision vector corresponds to the initial state at time t = 1 (initial collision vector)
The next state comes at time t + p, where p is a permissible latency in the range 1 <= p < m
Right Shift Register
The next state can be obtained with the help of an m-bit shift register
0
0
1 Collision
Safe to allow an initiation
Each 1-bit shift corresponds to increase in the latency by 1
The next state The next state is obtained by
bitwise ORing the initial collision vector with the shifted register
C.V. = 1 0 1 1 0 1 0 (first state)0 1 0 1 1 0 1 C.V. 1-bit right shifted
1 0 1 1 0 1 0 initial C.V.---------------- OR
1 1 1 1 1 1 1
State Diagram for X
1 0 1 1 0 1 0
1 1 1 1 1 1 11 0 1 1 0 1 1
36 8+
6
8+
8+
3*
1*
Cycles Simple cycles each state
appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles simple cycles
whose edges are all made with minimum latencies from their respective starting states
(1,8), (3) one of them is MAL
MAL Minimum Average latency At least one of the greedy cycles
will lead to the MAL Consider state diagram for Y, MAL
is 3 (See diagram)
State Diagram for Y
1 0 1 0
1 1 1 11 0 1 1 0 1 1
35+
5+
5+
3*
1*
Bounds on the MAL MAL is lower bounded by the maximum
number of checkmarks in any row of the reservation table. (Shar, 1972)
MAL is lower than or equal to the average latency of any greedy cycle in the state diagram. (Shar, 1972)
The average latency of any greedy cycle is upper-bounded by the number of 1’s in the initial collision vector plus 1. This is also an upper bund on the MAL. (Shar, 1972)
Delay Insertion The purpose is to modify the
reservation table, yielding a new collision vector
This may lead to a modified state diagram, which may produce greedy cycles meeting the lower bound on MAL
Example
S1 S2 S3
output
Example (Cont.)
1 2 3 4 5
S1 X X
S2 X X
S3 X X
Forbidden Latencies: 1, 2, 4C.V. 1 0 1 1
Example (Cont.) State Diagram
1 0 1 13*
5+
MAL = 3
Example (Cont.)
S1 S2 S3
outputD1
D2
Example (Cont.)
1 2 3 4 5 6 7
S1 X X
S2 X X
S3 X X
D1 X
D2 X
Forbidden: 2, 6C.V. 1 0 0 0 1 0
Group Activity 1
Find the State Diagram
Pipeline Throughput The average number of task
initiations per clock cycle
The inverse of MAL
Group Activity 2
1 2 3 4
S1 X X
S2 X
S3 X
C.V State Diagram Simple Cycles
Greedy Cycles MAL Throughput (t = 20 ns)
Multiprocessors
Introduction Uniprocessor systems are not capable
of delivering solutions to some problems in reasonable time
Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution
Speed-up versus Quality-up
Architecture Background Three major Components
Processors
Memory Modules
Interconnection Network
Parallel and Distributed Computers MIMD Shared Memory
Bus based Switch based CC-NUMA
MIMD Distributed Memory SIMD Computers Clusters Grid Computing
MIMD Shared Memory Systems
Interconnection Networks
M M M M
P P P P P
Bus Based & switch based SM Systems
Global Memory
P
C
P
C
P
C
P C
P C
P C
P C
M M M M
Cache Coherent NUMA
Interconnection Network
M
C
P
M
C
P
M
C
P
M
C
P
MIMD Distributed Memory Systems
Interconnection Networks
M M M M
P P P P
SIMD Computers
Processor
Memory
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
von Neumann Computer
Some Interconnection Network
Clusters
M
C
P
I/O
OS
M
C
P
I/O
OS
M
C
P
I/O
OS
Middleware
Programming Environment
Interconnection Network
Grids Grids are geographically
distributed platforms for computation.
They provide dependable, consistent, pervasive, and inexpensive access to high end computational capabilities.
Interconnection Network Taxonomy
Interconnection Network
Static Dynamic
Bus-based Switch-based1-D 2-D HC
Single Multiple SS MS Crossbar