41
 SIMD and Associative Computational Models Parallel & Distributed Algorithms

SIMD Models PDA Sp07

Embed Size (px)

Citation preview

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 1/41

 

SIMD and Associative

Computational Models

Parallel & Distributed Algorithms

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 2/41

 

SIMD and Associative

Computational Models

Part I: SIMD Model

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 3/41

 

Flynn’s Taxonomy

• The best known classification scheme for parallel computers.

• Depends on parallelism they exhibit with

 – Instruction streams

 – Data streams

• A sequence of instructions (the instruction

stream) manipulates a sequence of operands

(the data stream)• The instruction stream (I) and the data stream

(D) can be either single (S) or multiple (M)

• Four combinations: SISD, SIMD, MISD, MIMD

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 4/41

 

Flynn’s Taxonomy (cont.)

• SISD

 – Single Instruction Stream, Single Data Stream

 – Most important member is a sequential computer 

 – Some argue other models included as well.

• SIMD

 – Single Instruction Stream, Multiple Data Streams – One of the two most important in Flynn’s Taxonomy

• MISD

 – Multiple Instruction Streams, Single Data Stream

 – Relatively unused terminology. Some argue that this includespipeline computing.

• MIMD

 – Multiple Instructions, Multiple Data Streams

 – An important classification in Flynn’s Taxonomy

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 5/41

 

The SIMD Computer & Model

Consists of two types of processors:• A front-end or control unit

 – Stores a copy of the program

 – Has a program control unit to execute program

 – Broadcasts parallel program instructions to the array

of processors.

• Array of processors of simplistic processors that

are functionally more like an ALU. – Does not store a copy of the program nor have a

program control unit.

 – Executes the commands in parallel sent by the front

end.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 6/41

 

SIMD (cont.)

• On a memory access, all activeprocessors must access the same location 

in their local memory.

• All active processor executes the sameinstruction synchronously, but on different

data

• The sequence of different data items isoften referred to as a vector.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 7/41

 

Alternate Names for SIMDs

• Recall that all active processors of a SIMD

computer must simultaneously access the same

memory location.

• The value in the i-th processor can be viewed asthe i-th component of a vector.

• SIMD machines are sometimes called vector 

computers [Jordan,et.al.] or  processor arrays 

[Quinn 94,04] based on their ability to executevector and matrix operations efficiently.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 8/41

 

Alternate Names (cont.)

• In particular, the Quinn Textbook for this

course, Quinn calls a SIMD a processor 

array.

• Quinn and a few others also considers a

pipelined vector processor to be a SIMD

 – This is a somewhat non-standard use of the

term. – An example is the Cray-1

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 9/41

 

How to View a SIMD Machine

• Think of soldiers all in a unit.

• A commander selects certain soldiers as

active – for example, every even

numbered row.

• The commander barks out an order that all

the active soldiers should do and they

execute the order synchronously.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 10/41

 

SIMD Execution Style

 – Collectively, the individual memories of the processingelements (PEs) store the (vector) data that isprocessed in parallel.

 – When the front end encounters an instruction whose

operand is a vector, it issues a command to the PEsto perform the instruction in parallel.

 – Although the PEs execute in parallel, some units canbe allowed to skip any particular instruction.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 11/41

 

SIMD Computers

• SIMD computers that focus on vector 

operations – Support some vector and possibly matrix

operations in hardware

 – Usually limit or provide less support for non-vector type operations involving data in the“vector components”.

• General purpose SIMD computers

 – Support more traditional type operations (e.g.,other than for vector/matrix data types).

 – Usually also provide some vector andpossibly matrix operations in hardware.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 12/41

 

Possible Architecture for a

Generic SIMD

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 13/41

 

Interconnection Networks for 

SIMDs

• No specific interconnection network is

specified.

• 2D mesh has been used more more

frequently than others.

• Even hybrid networks (e.g., cube

connected cycles) have been used.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 14/41

 

Example of a 2-D Processor 

Interconnection Network in a SIMD

Each VLSI chip has 16 processing elements.

Each PE can simultaneously send a value to a specific

neighbor (e.g., their left neighbor).

PE =

processor 

element

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 15/41

 

SIMD Execution Style

• The traditional (SIMD, vector, processor array)

execution style ([Quinn 94, pg 62], [Quinn 2004, pgs37-43]:

 – The sequential processor that broadcasts thecommands to the rest of the processors is calledthe front end  or control unit .

 – The front end is a general purpose CPU that storesthe program and the data that is not manipulated inparallel.

 – The front end normally executes the sequentialportions of the program.

 – Each processing element has a local memory thatcan not be directly accessed by the host or other processing elements.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 16/41

 

SIMD Execution Style

 – Collectively, the individual memories of the processingelements (PEs) store the (vector) data that isprocessed in parallel.

 – When the front end encounters an instruction whose

operand is a vector, it issues a command to the PEsto perform the instruction in parallel.

 – Although the PEs execute in parallel, some units canbe allowed to skip any particular instruction.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 17/41

 

Masking on Processor Arrays

• All the processors work in lockstep except thosethat are masked out (by setting mask register).• The parallel if-then-else is frequently used in

SIMDs to set masks, – Every active processor tests to see if its data meets the

negation of the boolean condition. – If it does, it sets its mask bit so those processors will

not participate in the operation initially. – Next the unmasked processors, execute the THEN

part.

 – Afterwards, mask bits (for original set of activeprocessors) are flipped and unmasked processorsperform the the ELSE part.

• Note: differs from the sequential version of “If”

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 18/41

 

if (COND) then A else B

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 19/41

 

if (COND) then A else B

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 20/41

 

if (COND) then A else B

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 21/41

 

Data Parallelism

(A strength for SIMDs)

• All tasks (or processors) apply the same set of operations todifferent data.

• Example:

• . Accomplished on SIMDs by having all active processorsexecute the operations synchronously

• MIMDs can also handle data parallel execution, but mustsynchronize more frequently.

for i←0 to 99 do

a[i]←

b[i] + c[i]endfor 

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 22/41

 

Functional/Control/Job Parallelism

(A Strictly-MIMD Paradigm)

• Independent tasks apply different operations to

different data elements

• First and second statements execute concurrently

• Third and fourth statements execute concurrently

a←2 b←3

m←(a + b) / 2

s←(a2 + b2) / 2

v←s - m2

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 23/41

 

SIMD Machines

• An early SIMD computer designed for vector and matrix processing was the Illiac IV

computer 

 – built at the University of Illinois

 – See Jordan et. al., pg 7

• The MPP, DAP, the Connection Machines

CM-1 and CM-2, MasPar MP-1 and MP-2

are examples of SIMD computers

 – See Akl pg 8-12 and [Quinn, 94]

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 24/41

 

SIMD Machines

• Quinn [1994, pg 63-67] discusses the CM-2Connection Machine and a smaller & updated

CM-200.

• Professor Batcher was the chief architect for the

STARAN and the MPP (Massively ParallelProcessor) and an advisor for the ASPRO

 – ASPRO is a small second generation STARAN used

by the Navy in the spy planes.

• Professor Batcher is best known architecturally

for the MPP, which is at the Smithsonian

Institute & currently displayed at a D.C. airport.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 25/41

 

Today’s SIMDs

• Many SIMDs are being embedded in SISDmachines.

• Others are being build as part of hybrid

architectures.• Others are being build as special purpose

machines, although some of them could

classify as general purpose.• Much of the recent work with SIMD

architectures is proprietary.

C

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 26/41

 

A Company Building Inexpensive

SIMD

WorldScape is producing a COTS(commodity off the shelf) SIMD 

• Not a traditional SIMD as – The PEs are full-fledged CPU’s

 – the hardware doesn’t synchronize every step.• Hardware design supports efficientsynchronization

• Their machine is programmed like a SIMD.

• The U.S. Navy has observed that their machinesprocess radar a magnitude faster than others.• There is quite a bit of information about their 

work at http://www.wscape.com

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 27/41

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 28/41

 

Hybrid Architecture

High speed Myrinet switchHigh speed Myrinet switch

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola

1024

Systola

1024

Systola

1024

Systola

1024

Systola

1024

Systola

1024

Systola

1024

Systola

1024

ombines SIMD and MIMD paradigm within a parallel architecture⇒Hybrid Computer Hybrid Computer 

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 29/41

 

Architecture of Systola

1024

Interface processors

ISA

RAM NORTH

host computer bus

Controller 

RAM WEST

program memory

• Instruction Systolic Array:

 – 32 × 32 mesh of processingelements

 – wavefront instruction

execution

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 30/41

 

SIMDs Embedded in SISDs• Intel's Pentium 4 includes what they call MMX

technology to gain a significant performanceboost

• IBM and Motorola incorporated the technologyinto their G4 PowerPC chip in what they call

their Velocity Engine.• Both MMX technology and the Velocity Engine

are the chip manufacturer's name for their proprietary SIMD processors and parallel

extensions to their operating code.• This same approach is used by NVidia and

Evans & Sutherland to dramatically accelerategraphics rendering.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 31/41

 

Special Purpose SIMDs in the

Bioinformatics Arena• Parcel

 – Acquired by Celera Genomics in 2000

 – Products include the sequencesupercomputer GeneMatcher, which has ahigh throughput sequence analysis capability

• Supported over a million processors earlier 

 – GeneMatcher was used by Celera in their 

race with U.S. government to complete thedescription of the human genome sequencing

• TimeLogic, Inc – Has DeCypher, a reconfigurable SIMD

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 32/41

 

Advantages of SIMDs

• Reference: [Roosta, pg 10]

• Less hardware than MIMDs as they have only one

control unit.

 – Control units are complex.• Less memory needed than MIMD

 – Only one copy of the instructions need to be stored

 – Allows more data to be stored in memory.

• Less startup time in communicating between PEs.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 33/41

 

Advantages of SIMDs

• Single instruction stream and synchronization of PEs make SIMD applications easier to program,

understand, & debug.

 – Similar to sequential programming

• Control flow operations and scalar operations

can be executed on the control unit while PEs

are executing other instructions.

• MIMD architectures require explicitsynchronization primitives, which create a

substantial amount of additional overhead.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 34/41

 

Advantages of SIMDs

• During a communication operation between PEs, – PEs send data to a neighboring PE in parallel and in

lock step – No need to create a header with routing information as

“routing” is determined by program steps. – the entire communication operation is executedsynchronously

 – A tight (worst case) upper bound for the time for thisoperation can be computed.

• Less complex hardware in SIMD since nomessage decoder is needed in PEs – MIMDs need a message decoder in each PE.

SIMD Shortcomings

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 35/41

 

SIMD Shortcomings(with some rebuttals)

• Claims are from our textbook by Quinn. – Similar statements are found in one of our 

“primary reference book” by Grama, et. al [13].

• Claim 1: Not all problems are data-parallel

 – While true, most problems seem to have dataparallel solutions.

 – In [Fox, et.al.], the observation was made in

their study of large parallel applications thatmost were data parallel by nature, but oftenhad points where significant branchingoccurred.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 36/41

 

SIMD Shortcomings(with some rebuttals)

• Claim 2: Speed drops for conditionally executedbranches – Processors in both MIMD & SIMD normally have to do a

significant amount of ‘condition’ testing

 – MIMDs processors can execute multiple branchesconcurrently.

 – For an if-then-else statement with execution times for the“then” and “else” parts being roughly equal, about ½ of theSIMD processors are idle during its execution

• With additional branching, the average number of inactiveprocessors can become even higher.

• With SIMDs, only one of these branches can be executedat a time.

• This reason justifies the study of multiple SIMDs (or MSIMDs).

SIMD Shortcomings

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 37/41

 

SIMD Shortcomings

(with some rebuttals)

• Claim 2 (cont): Speed drops for conditionallyexecuted code

 – In [Fox, et.al.], the observation was made that for thereal applications surveyed, the MAXIMUM number of 

active branches at any point in time was about 8. – The cost of the extremely simple processors used in

a SIMD are extremely low

• Programmers used to worry about ‘full utilization of 

memory’ but stopped this after memory cost becameinsignificant overall.

SIMD Shortcomings

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 38/41

 

SIMD Shortcomings(with some rebuttals)

• Claim 3: Don’t adapt to multiple users well.

 – This is true to some degree for all parallel computers.

 – If usage of a parallel processor is dedicated to a important

problem, it is probably best not to risk compromising its

performance by ‘sharing’ – This reason also justifies the study of multiple SIMDs (or 

MSIMD).

 – SIMD architecture has not received the attention that MIMD

has received and can greatly benefit from further research.

SIMD Shortcomings

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 39/41

 

SIMD Shortcomings(with some rebuttals)

• Claim 4: Do not scale down well to

“starter” systems that are affordable.

 – This point is arguable and its ‘truth’ is likely to

vary rapidly over time

 – WorldScape/ClearSpeed currently sells a very

economical SIMD board that plugs into a PC.

SIMD Sh i

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 40/41

 

SIMD Shortcomings(with some rebuttals)

Claim 5: Requires customized VLSI for processorsand expense of control units has dropped

• Reliance on COTS (Commodity, off-the-shelf parts)

has dropped the price of MIMDS• Expense of PCs (with control units) has dropped

significantly

• However, reliance on COTS has fueled the success

of ‘low level parallelism’ provided by clusters andrestricted new innovative parallel architecture

research for well over a decade.

8/8/2019 SIMD Models PDA Sp07

http://slidepdf.com/reader/full/simd-models-pda-sp07 41/41